[ 
https://issues.apache.org/jira/browse/XERCESC-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046487#comment-13046487
 ] 

Alberto Massari commented on XERCESC-1967:
------------------------------------------

No, I believe that RFC3023 is correct, but I am making a distinction between 
the parser and the code invoking the parser.
The XML parser is responsible for providing a parse(stream) function, and only 
knows what is written inside the stream; so, it expects a BOM, an encoding 
declaration and an XML-compliant sequence of bytes. If the BOM and/or the 
encoding is missing, it has its own fallback machanism in place to determine 
the encoding to be used in parsing. It only obeys to the XML specifications.
It also allows the stream to state "this is the encoding you should use, 
regardless of what you think", that someone from outside takes care of setting.
RFC3023 regulates how an HTTP transport can specify an encoding for the HTTP 
communication of an XML fragment, and is correct in saying that the HTTP 
envelope has the precedence over the XML content. After all, it's the HTTP 
transport that took the original payload and decided to re-encode it (case 8.20 
in the RFC), so the client should trust the HTTP content type more than the 
internal XML fragment. In the Xerces case, the NetAccessor is the piece of 
code, external to the parser, that should take care of setting in the stream 
the setting "this is your encoding, ignore what you find in the XML".

> Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the 
> charset parameter of the HTTP content-type: header
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: XERCESC-1967
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1967
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 3.1.1
>         Environment: Mac OS X Snow Leopard (Intel).  
> (http://mirrorservice.nomedia.no/apache.org//xerces/c/3/binaries/xerces-c-3.1.1-x86-macosx-gcc-4.0.tar.gz)
> And also tested the XMLmind XML editor on same platorm.
>            Reporter: Leif Halvard Silli
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> [1] http://www.w3.org/mid/[email protected]
> [2] http://www.w3.org/mid/[email protected]
> It is a XML 1.0 spec vioation. well-formed violation.
> Test cases without XML declaration: http://malform.no/testing/html5/bom/
> Test cases *with* XML declartion to be added later.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to