Hi, 

I need to modify the Nutch Indexer class because for me it is very
useful to add some fields to the generated Lucene index. I was trying
and I find out that it is possible to add fields to the Document with
doc.addField() in the reduce function. My point is that for those fields
I need the html content of the webpage to process it, but it looks not
to be present yet in the Document because it throws a null pointer
exception with getField("content"), maybe that is not the correct way to
access it, or the correct place. So, How and where can I access to the
html content of the document to add a new field to the Lucene Document
and so on to the generated index?

Any advice will be very helpful, 


Thanks in advance. 

Javier.


Reply via email to