[ 
https://issues.apache.org/jira/browse/NUTCH-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694277#comment-13694277
 ] 

Sebastian Nagel commented on NUTCH-1592:
----------------------------------------

XML and Xpath are case-sensitive. Neko (used per default by parse-html) 
converts element names to uppercase while lowercasing attributes 
[[1|http://nekohtml.sourceforge.net/faq.html#uppercase]] following the [[DOM 
spec|http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#ID-5DFED1F0]].
 Parse-tika behaves different: element names are lowercase. Is that the reason?
                
> XPath works on documents parsed with parse-html but not parse-tika
> ------------------------------------------------------------------
>
>                 Key: NUTCH-1592
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1592
>             Project: Nutch
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Julien Nioche
>             Fix For: 1.8
>
>
> The title says it all. The behaviour should be the same regardless of which 
> parser is used

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to