Hi Cesar,

This can definitely be done using a custom parse plugin and an indexing
plugin. We did something like this sometime ago to classify adult pages
using our text classification API (
http://code.google.com/p/textclassification/) which is based on SVM.

Out of interest, what categories are you planning to use and how will you
build the training corpus?

HTH

Julien

-- 
DigitalPebble Ltd

Open Source Solutions for Text Engineering
http://www.digitalpebble.com

On 6 July 2010 12:51, Luan Cestari <[email protected]> wrote:

> Nutch Developers,
>
> I'm at the last year of Computer Science and my graduation project is
> related to web search. The plan is to add a filter of page's category to
> Nutch, in a attempt to use SVM to classify the crawled pages.
>
> So I ask you: do you think I'll have to change internals of Nutch or can
> this be done with plugins?
>
> Thanks.
>
> Best Regards,
> Luan Cestari
>

Reply via email to