Nutch OpenSearch sometimes raises DOMExceptions due to Lucene column names not 
being valid XML tag names
--------------------------------------------------------------------------------------------------------

                 Key: NUTCH-906
                 URL: https://issues.apache.org/jira/browse/NUTCH-906
             Project: Nutch
          Issue Type: Bug
          Components: web gui
    Affects Versions: 1.1
         Environment: Debian GNU/Linux 64-bit
            Reporter: Asheesh Laroia


The Nutch FAQ explains that OpenSearch includes "all fields that are available 
at search result time." However, some Lucene column names can start with 
numbers. Valid XML tags cannot. If Nutch is generating OpenSearch results for a 
document with a Lucene document column whose name starts with numbers, the 
underlying Xerces library throws this exception: 

org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML 
character is specified. 

So I have written a patch that tests strings before they are used to generate 
tags within OpenSearch.

I hope you merge this, or a better version of the patch!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to