[
https://issues.apache.org/jira/browse/NUTCH-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Asheesh Laroia updated NUTCH-906:
---------------------------------
Attachment: 0001-OpenSearch-If-a-Lucene-column-name-begins-with-a-num.patch
Patch, including a test
> Nutch OpenSearch sometimes raises DOMExceptions due to Lucene column names
> not being valid XML tag names
> --------------------------------------------------------------------------------------------------------
>
> Key: NUTCH-906
> URL: https://issues.apache.org/jira/browse/NUTCH-906
> Project: Nutch
> Issue Type: Bug
> Components: web gui
> Affects Versions: 1.1
> Environment: Debian GNU/Linux 64-bit
> Reporter: Asheesh Laroia
> Attachments:
> 0001-OpenSearch-If-a-Lucene-column-name-begins-with-a-num.patch
>
> Original Estimate: 0.33h
> Remaining Estimate: 0.33h
>
> The Nutch FAQ explains that OpenSearch includes "all fields that are
> available at search result time." However, some Lucene column names can start
> with numbers. Valid XML tags cannot. If Nutch is generating OpenSearch
> results for a document with a Lucene document column whose name starts with
> numbers, the underlying Xerces library throws this exception:
> org.w3c.dom.DOMException: INVALID_CHARACTER_ERR: An invalid or illegal XML
> character is specified.
> So I have written a patch that tests strings before they are used to generate
> tags within OpenSearch.
> I hope you merge this, or a better version of the patch!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.