[
https://issues.apache.org/jira/browse/NUTCH-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694277#comment-13694277
]
Sebastian Nagel commented on NUTCH-1592:
----------------------------------------
XML and Xpath are case-sensitive. Neko (used per default by parse-html)
converts element names to uppercase while lowercasing attributes
[[1|http://nekohtml.sourceforge.net/faq.html#uppercase]] following the [[DOM
spec|http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-5DFED1F0]].
Parse-tika behaves different: element names are lowercase. Is that the reason?
> XPath works on documents parsed with parse-html but not parse-tika
> ------------------------------------------------------------------
>
> Key: NUTCH-1592
> URL: https://issues.apache.org/jira/browse/NUTCH-1592
> Project: Nutch
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.6
> Reporter: Julien Nioche
> Fix For: 1.8
>
>
> The title says it all. The behaviour should be the same regardless of which
> parser is used
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira