Hi Alby,

I added BinInputStream::getContentType() some time ago so that I could accomplish this kind of thing in XQilla. My guess is that you can build Xerces-C stream encoding support on top of this. InputSource currently has a getEncoding() method, but the HTTP call hasn't been made by this point - maybe BinInputStream also needs a getEncoding() method which takes it's default from the InputSource?

John

On 09/06/11 13:44, Alberto Massari (JIRA) wrote:

     [ 
https://issues.apache.org/jira/browse/XERCESC-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046508#comment-13046508
 ]

Alberto Massari commented on XERCESC-1967:
------------------------------------------

I don't agree on your request of reversing the priorities, but that's a 
discussion that shouldn't be done here. Good luck in trying to convince W3C.
The XML spec says that the BOM+internal encoding have the precedence when the 
XML is in a *file*, because it is likely that no transcoding has been performed 
on top of it. For all the other scenarios (when the XML is in a byte stream) 
the component that does the wrapping should take care of telling the parser the 
new setting. This is what Xerces is doing now, and in my opinion it's correct 
and shouldn't be changed.
What is missing in Xerces is the capability of propagating the content-type 
read from the HTTP stream to the parser; whether the content type is text/xml 
vs application/xml, this is simply affecting what is the default encoding when 
the content-type is not specified. And in case 8.20 there is an encoding 
specified, so it doesn't matter which one (text/xml or application/xml) was 
specified.

In short, if you think that pparse (or saxcount) should refuse to parse your 
web page (that has an HTTP content type specifying Korean, plus an UTF-8 BOM), 
I agree and I will try to fix it.


Xerces ignores (deletes, swallow, ignores) the UTF-8 BOM and also ignores the 
charset parameter of the HTTP content-type: header
--------------------------------------------------------------------------------------------------------------------------------

                 Key: XERCESC-1967
                 URL: https://issues.apache.org/jira/browse/XERCESC-1967
             Project: Xerces-C++
          Issue Type: Bug
          Components: Non-Validating Parser
    Affects Versions: 3.1.1
         Environment: Mac OS X Snow Leopard (Intel).  
(http://mirrorservice.nomedia.no/apache.org//xerces/c/3/binaries/xerces-c-3.1.1-x86-macosx-gcc-4.0.tar.gz)
And also tested the XMLmind XML editor on same platorm.
            Reporter: Leif Halvard Silli
   Original Estimate: 4h
  Remaining Estimate: 4h

[1] http://www.w3.org/mid/[email protected]
[2] http://www.w3.org/mid/[email protected]
It is a XML 1.0 spec vioation. well-formed violation.
Test cases without XML declaration: http://malform.no/testing/html5/bom/
Test cases *with* XML declartion to be added later.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to