Hi Sebastian,
Thanks for your reply,

I think the answer is :

- separate parse trees (DOM trees) for parts of a document,
  e.g., chapters, sections, tables, and other structural elements

Let's say i have an html page with several sections, i need to extract (using xpath) every section as a parse and index it as a seperate document, every parse will have it's own metadata, outlinks, content, title ...

Thanks,

On 20/03/2015 20:38, Sebastian Nagel wrote:
Hi Mahmoud,

what is meant by "multiple parses"?

- separate parse trees (DOM trees) for parts of a document,
   e.g., chapters, sections, tables, and other structural elements
- interpreting the same documents with multiple parsers,
   e.g., different HTML parsers
- parses of multi-document containers, e.g. zip files

Thanks,
Sebastian


On 03/20/2015 02:52 PM, Mahmoud Gzawi wrote:
Hi everyone.

Is there any way to extract multiple parses from one page in nutch 2.x? Can 
anyone give hints where
should i start digging?

Thanks.

Reply via email to