Re: multiple parses from one page

Mahmoud Gzawi Fri, 20 Mar 2015 13:02:10 -0700

By the way, i'm using a modification of the xpath-filter plugin. Theproblem is that nutch returns only one parse, so when i have multipleparses in the plugin, nutch overwrites all the parses and return onlythe last one.


On 20/03/2015 20:54, Mahmoud Gzawi wrote:

Hi Sebastian,
Thanks for your reply,
I think the answer is :

- separate parse trees (DOM trees) for parts of a document,
  e.g., chapters, sections, tables, and other structural elements
Let's say i have an html page with several sections, i need to extract(using xpath) every section as a parse and index it as a seperatedocument, every parse will have it's own metadata, outlinks, content,title ...
Thanks,

On 20/03/2015 20:38, Sebastian Nagel wrote:
Hi Mahmoud,

what is meant by "multiple parses"?

- separate parse trees (DOM trees) for parts of a document,
   e.g., chapters, sections, tables, and other structural elements
- interpreting the same documents with multiple parsers,
   e.g., different HTML parsers
- parses of multi-document containers, e.g. zip files

Thanks,
Sebastian


On 03/20/2015 02:52 PM, Mahmoud Gzawi wrote:
Hi everyone.
Is there any way to extract multiple parses from one page in nutch2.x? Can anyone give hints where
should i start digging?

Thanks.

Re: multiple parses from one page

Reply via email to