[ 
https://issues.apache.org/jira/browse/XERCESC-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046449#comment-13046449
 ] 

Leif Halvard Silli commented on XERCESC-1967:
---------------------------------------------

Well, it is not only "also Xerces". I describe in [2]  how Xerces behaves. I 
also link to test cases without XML declaration. 

But to give what you ask for: 

* I rand from command line this:
   $ pparse http://malform.no/testing/html5/bom/xml.html
* that test case page is an 'application/xhtml+xml' document
* This document is UTF-8 encoded, with a BOM, but is *served* by HTTP as 
ISO-8859-1 encoded. 
*  Because HTTP says that the Content-Type charset parameter has priority over 
document internal encoding information, the document is not well-formed, 
because the there is a illegal character - "BOM" - in the begining of the 
document. (It actually isn't a BOM character when it is read as ISO-8859-1.)

The Xerces pparser should therefore emit 'fatal error' message. But instead of 
doing so, it simply emits this:
 
http://malform.no/testing/html5/bom/xml.html: 122 ms (24 elems, 7 attrs, 0 
spaces, 2469 chars)

PS: Please note that I am not sure that Xerces should actually be corrected to 
adhere to RFC3023. I am actually advocating that XML 1.0 should be changed to 
say that the document information overrides the HTTP information. Because, the 
only parsers behaving like RFC3023 says, seems to be Opera and Firefox.

PPS: I will add example with document containg XML encoding declaration later 
on. (Time constraint.)

> Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the 
> charset parameter of the HTTP content-type: header
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: XERCESC-1967
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1967
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Non-Validating Parser
>    Affects Versions: 3.1.1
>         Environment: Mac OS X Snow Leopard (Intel).  
> (http://mirrorservice.nomedia.no/apache.org//xerces/c/3/binaries/xerces-c-3.1.1-x86-macosx-gcc-4.0.tar.gz)
> And also tested the XMLmind XML editor on same platorm.
>            Reporter: Leif Halvard Silli
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> [1] http://www.w3.org/mid/[email protected]
> [2] http://www.w3.org/mid/[email protected]
> It is a XML 1.0 spec vioation. well-formed violation.
> Test cases without XML declaration: http://malform.no/testing/html5/bom/
> Test cases *with* XML declartion to be added later.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to