Nutch OpenSearch sometimes raises DOMExceptions due to Lucene column names not
being valid XML tag names
--------------------------------------------------------------------------------------------------------
Key: NUTCH-906
URL: https://issues.apache.org/jira/browse/NUTCH-906
Project: Nutch
Issue Type: Bug
Components: web gui
Affects Versions: 1.1
Environment: Debian GNU/Linux 64-bit
Reporter: Asheesh Laroia
The Nutch FAQ explains that OpenSearch includes "all fields that are available
at search result time." However, some Lucene column names can start with
numbers. Valid XML tags cannot. If Nutch is generating OpenSearch results for a
document with a Lucene document column whose name starts with numbers, the
underlying Xerces library throws this exception:
org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML
character is specified.
So I have written a patch that tests strings before they are used to generate
tags within OpenSearch.
I hope you merge this, or a better version of the patch!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.