Re: UTF-8/Latin-1 problem

David_N_Bertoni Wed, 06 Jun 2001 06:48:17 -0700

Are you saying that your document has an XML decl with the correct encoding
and the parser is not honoring the encoding?  That sounds like a huge bug,
and I can't believe it would not have been caught in testing.  If so, you
should file a bug in bugzilla and attach a sample document that reproduces
the problem.

If, on the other, you're saying your document is encoding in iso-8859-1,
but it there is no encoding in the XML decl, then you have a document that
is not well-formed.  There is no way for the parser to "autodetect"
iso-8859-1.  Indeed, it is required to assume utf-8 in the absence of an
explicit encoding.

By the way, this question is not appropriate for the general list.  You
should subscribe to the Xerces-J and post your parer-related questions
there.

Dave



                                                                                       
                            
                    Britta Schüle                                                      
                            
                    <britta.schuel        To:     [EMAIL PROTECTED]               
                            
                    [EMAIL PROTECTED]>            cc:     (bcc: David N Bertoni/CAM/Lotus)     
                            
                                          Subject:     UTF-8/Latin-1 problem           
                            
                    06/06/2001                                                         
                            
                    04:10 AM                                                           
                            
                    Please respond                                                     
                            
                    to general                                                         
                            
                                                                                       
                            
                                                                                       
                            



Hi,
I'm working on a project where xml's might have all sorts of encodings. The
parser deals with the UTF-8 stuff just fine, but when it gets a Latin-1
(iso-8859-1), it produces useless characters unless I set the encoding
explicitly.
Now I can't quite believe that the parser won't read the encoding from the
XML, so my question is, am I missing something? Is there a way to get the
parser to sort of "autodetect" an XML file's encoding?
I'm currently testing on the SAX2SAX sample in the Xalan-Java 2 download.
Thanks loads in advance,
Britta

---------------------------------------------------------------------
In case of troubles, e-mail:     [EMAIL PROTECTED]
To unsubscribe, e-mail:          [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
In case of troubles, e-mail:     [EMAIL PROTECTED]
To unsubscribe, e-mail:          [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: UTF-8/Latin-1 problem

Reply via email to