Hi Cesar, This can definitely be done using a custom parse plugin and an indexing plugin. We did something like this sometime ago to classify adult pages using our text classification API ( http://code.google.com/p/textclassification/) which is based on SVM.
Out of interest, what categories are you planning to use and how will you build the training corpus? HTH Julien -- DigitalPebble Ltd Open Source Solutions for Text Engineering http://www.digitalpebble.com On 6 July 2010 12:51, Luan Cestari <[email protected]> wrote: > Nutch Developers, > > I'm at the last year of Computer Science and my graduation project is > related to web search. The plan is to add a filter of page's category to > Nutch, in a attempt to use SVM to classify the crawled pages. > > So I ask you: do you think I'll have to change internals of Nutch or can > this be done with plugins? > > Thanks. > > Best Regards, > Luan Cestari >

