Re: multiple parses from one page

Mahmoud Gzawi Fri, 20 Mar 2015 12:55:19 -0700

Hi Sebastian,
Thanks for your reply,

I think the answer is :


- separate parse trees (DOM trees) for parts of a document,
  e.g., chapters, sections, tables, and other structural elements

Let's say i have an html page with several sections, i need to extract(using xpath) every section as a parse and index it as a seperatedocument, every parse will have it's own metadata, outlinks, content,title ...


Thanks,

On 20/03/2015 20:38, Sebastian Nagel wrote:

Hi Mahmoud,

what is meant by "multiple parses"?

- separate parse trees (DOM trees) for parts of a document,
   e.g., chapters, sections, tables, and other structural elements
- interpreting the same documents with multiple parsers,
   e.g., different HTML parsers
- parses of multi-document containers, e.g. zip files

Thanks,
Sebastian


On 03/20/2015 02:52 PM, Mahmoud Gzawi wrote:

Hi everyone.

Is there any way to extract multiple parses from one page in nutch 2.x? Can 
anyone give hints where
should i start digging?

Thanks.

Re: multiple parses from one page

Reply via email to