[
https://issues.apache.org/jira/browse/XERCESC-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046487#comment-13046487
]
Alberto Massari commented on XERCESC-1967:
------------------------------------------
No, I believe that RFC3023 is correct, but I am making a distinction between
the parser and the code invoking the parser.
The XML parser is responsible for providing a parse(stream) function, and only
knows what is written inside the stream; so, it expects a BOM, an encoding
declaration and an XML-compliant sequence of bytes. If the BOM and/or the
encoding is missing, it has its own fallback machanism in place to determine
the encoding to be used in parsing. It only obeys to the XML specifications.
It also allows the stream to state "this is the encoding you should use,
regardless of what you think", that someone from outside takes care of setting.
RFC3023 regulates how an HTTP transport can specify an encoding for the HTTP
communication of an XML fragment, and is correct in saying that the HTTP
envelope has the precedence over the XML content. After all, it's the HTTP
transport that took the original payload and decided to re-encode it (case 8.20
in the RFC), so the client should trust the HTTP content type more than the
internal XML fragment. In the Xerces case, the NetAccessor is the piece of
code, external to the parser, that should take care of setting in the stream
the setting "this is your encoding, ignore what you find in the XML".
> Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the
> charset parameter of the HTTP content-type: header
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: XERCESC-1967
> URL: https://issues.apache.org/jira/browse/XERCESC-1967
> Project: Xerces-C++
> Issue Type: Bug
> Components: Non-Validating Parser
> Affects Versions: 3.1.1
> Environment: Mac OS X Snow Leopard (Intel).
> (http://mirrorservice.nomedia.no/apache.org//xerces/c/3/binaries/xerces-c-3.1.1-x86-macosx-gcc-4.0.tar.gz)
> And also tested the XMLmind XML editor on same platorm.
> Reporter: Leif Halvard Silli
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> [1] http://www.w3.org/mid/[email protected]
> [2] http://www.w3.org/mid/[email protected]
> It is a XML 1.0 spec vioation. well-formed violation.
> Test cases without XML declaration: http://malform.no/testing/html5/bom/
> Test cases *with* XML declartion to be added later.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]