Hi Sebastian,
Thanks for your reply,
I think the answer is :
- separate parse trees (DOM trees) for parts of a document,
e.g., chapters, sections, tables, and other structural elements
Let's say i have an html page with several sections, i need to extract
(using xpath) every section as a parse and index it as a seperate
document, every parse will have it's own metadata, outlinks, content,
title ...
Thanks,
On 20/03/2015 20:38, Sebastian Nagel wrote:
Hi Mahmoud,
what is meant by "multiple parses"?
- separate parse trees (DOM trees) for parts of a document,
e.g., chapters, sections, tables, and other structural elements
- interpreting the same documents with multiple parsers,
e.g., different HTML parsers
- parses of multi-document containers, e.g. zip files
Thanks,
Sebastian
On 03/20/2015 02:52 PM, Mahmoud Gzawi wrote:
Hi everyone.
Is there any way to extract multiple parses from one page in nutch 2.x? Can
anyone give hints where
should i start digging?
Thanks.