[jira] [Updated] (SOLR-2347) Use InputStream and not Reader for XML parsing

Mark Miller (JIRA) Mon, 31 Dec 2012 10:56:14 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mark Miller updated SOLR-2347:
------------------------------

    Fix Version/s:     (was: 4.1)
                   5.0
                   4.2
    
> Use InputStream and not Reader for XML parsing
> ----------------------------------------------
>
>                 Key: SOLR-2347
>                 URL: https://issues.apache.org/jira/browse/SOLR-2347
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 4.2, 5.0
>
>
> Followup to SOLR-96:
> Solr mostly uses java.io.Reader and passes this Reader to the XML parser. 
> According to XML spec, a XML file should be initially seen as a binary stream 
> with a default charset of UTF-8 or another charset given by the network 
> protocol (like Content-Type header in HTTP). But very important, this default 
> charset is only a "hint" to the parser - mandatory is the charset from the 
> XML header processing inctruction. Because of this, the parser must be able 
> to change the charset when reading the XML headers (possibly also when seeing 
> BOM markers). This is not possible if the XML parser gets a java.io.Reader 
> instead of java.io.InputStreams. SOLR-96 already fixed this for the 
> XmlUpdateRequestHandler and the DocumentAnalysisRequestHandler. This issue 
> should fix the rest to be conforming to XML-spec (open schema.xml and 
> config.xml as InputStream not Reader and others).
> This change would not break anything in Solr (perhaps only backwards 
> compatibility in the API), as the default used by XML parsers is UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2347) Use InputStream and not Reader for XML parsing

Reply via email to