By the way, i'm using a modification of the xpath-filter plugin. The problem is that nutch returns only one parse, so when i have multiple parses in the plugin, nutch overwrites all the parses and return only the last one.

On 20/03/2015 20:54, Mahmoud Gzawi wrote:
Hi Sebastian,
Thanks for your reply,

I think the answer is :

- separate parse trees (DOM trees) for parts of a document,
  e.g., chapters, sections, tables, and other structural elements

Let's say i have an html page with several sections, i need to extract (using xpath) every section as a parse and index it as a seperate document, every parse will have it's own metadata, outlinks, content, title ...

Thanks,

On 20/03/2015 20:38, Sebastian Nagel wrote:
Hi Mahmoud,

what is meant by "multiple parses"?

- separate parse trees (DOM trees) for parts of a document,
   e.g., chapters, sections, tables, and other structural elements
- interpreting the same documents with multiple parsers,
   e.g., different HTML parsers
- parses of multi-document containers, e.g. zip files

Thanks,
Sebastian


On 03/20/2015 02:52 PM, Mahmoud Gzawi wrote:
Hi everyone.

Is there any way to extract multiple parses from one page in nutch 2.x? Can anyone give hints where
should i start digging?

Thanks.


Reply via email to