[
https://issues.apache.org/jira/browse/NUTCH-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694564#comment-13694564
]
Julien Nioche commented on NUTCH-1592:
--------------------------------------
Hi Seb
That's a very plausible explanation. Ideally we should add a test to parse-tika
and parse-html to make sure that they produce the same DOM tree. The place to
hack in parse-tika would be org.apache.nutch.parse.tika.DOMBuilder I believe.
Not sure when I'll find the time to do that but at least it's now in JIRA.
Thanks
> XPath works on documents parsed with parse-html but not parse-tika
> ------------------------------------------------------------------
>
> Key: NUTCH-1592
> URL: https://issues.apache.org/jira/browse/NUTCH-1592
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.6
> Reporter: Julien Nioche
> Fix For: 1.8
>
>
> The title says it all. The behaviour should be the same regardless of which
> parser is used
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira