Ack!

Please do not use outdated versions of Xerces libraries. It makes it
difficult to assure you that your problem is not some sort of bug.

First I would try upgrading.

If upgrading does not work, perhaps you could search the user and dev lists
for FFFE and FEFF, as well as trying the xml.apache.org JIRA databse. They
might have record of this being a fixed bug, etc.

HTH!
Matt

-----Original Message-----
From: Xiaofan Zhou [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 26, 2005 3:06 PM
To: c-dev@xerces.apache.org
Subject: RE: Invalid character 0xFFFe


Matt, 

That's what I did, and I couldn't find FFFE at all. BTW, I am using
Xerces-c 1.7, could that be a problem?

Thanks.

Frank 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 26, 2005 2:52 PM
To: c-dev@xerces.apache.org
Subject: RE: Invalid character 0xFFFe

Can't you write a routine that does a trivial O(n) search for that byte
sequence, or do a memory search with GDB?

Anything that could tell you where it is would be a good idea, just to
find out if it's actually there or if something else is wrong.

Maybe print out your document array as ASCIIfied hex with
printf("...%x...") and so forth, then grep it for 0xFFFE.

Or dump it into a file and look at it with a hex editor.

--Matt

-----Original Message-----
From: Xiaofan Zhou [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 26, 2005 2:48 PM
To: c-dev@xerces.apache.org
Subject: RE: Invalid character 0xFFFe


 Thanks much for the reply. 

But in my case, the XML is created in memory, and I don't see the BOM
char at the beginning when I look at it in memory in debugger, in fact,
I don't see FFFE at all in memory. Also, if I make the XML smaller, then
everything works fine. So I wouild think that the FFFE is introduced
when I dump my DOM tree into a string (all this is done in memory), but
just don't know where. 

Any suggestion is very much appreciated.

Thanks.

Frank

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, May 26, 2005 2:34 PM
To: c-dev@xerces.apache.org
Subject: RE: Invalid character 0xFFFe

FFFE is the little endian rendition of the Byte Order Mark.
Byter Order Marks are not generally valid in in-memory Unicode data,
only in Unicode files.
It's probably on byte 0 of the input array. Can you strip it out?

See this quotation from
http://www.xencraft.com/resources/unicodebom.html :

The BOM character
The Unicode Character Standard designated two characters as an aid to
distinguish big-endian data from little-endian data. "Endianness" is not
a problem for UTF-8 since it is a serialized byte stream. However, to
process data encoded in UTF-16 or UTF-32, an application must first
determine if the data being read is in the same or different
"endianness" from the architecture that the application runs on. 
Unicode designated the character U+FEFF as the "Byte Order Mark" (BOM)
and reserved U+FFFE as an illegal character. If an application detects a
U+FFFE it can therefore presume that the data is in the opposite
endianness of the architecture and that the data should be byte swapped.
(A 32-bit architecture should also be word swapped.)

HTH!
Matt

-----Original Message-----
From: Xiaofan Zhou [mailto:[EMAIL PROTECTED]
Sent: Thursday, May 26, 2005 1:27 PM
To: xerces-c-dev@xml.apache.org
Subject: Invalid character 0xFFFe


Hi, All, 

I am encounting a XML parsing error saying something like the following:


XML Parser failed, Error: Invalid character (Unicode 0xFFFe)

The XML file size is about 100K in disk, the application is sort of like
this: it is created from a DOM tree in memory then passed to a SAX
parser.
The error is reported in the SAX parser.

Not sure where the unicode 0xFFFe come from, if I save the XML in file
then load it into Internet Explorer, it is fine.

Also,  If I reduce the size, it also works fine.

Any suggestions?  Thanks much in advance.

Frank
___________________________________________________________________
The information contained in this message and any attachment may be
proprietary, confidential, and privileged or subject to the work product
doctrine and thus protected from disclosure.  If the reader of this
message is not the intended recipient, or an employee or agent
responsible for delivering this message to the intended recipient, you
are hereby notified that any dissemination, distribution or copying of
this communication is strictly prohibited.
If you have received this communication in error, please notify me
immediately by replying to this message and deleting it and all copies
and backups thereof.  Thank you.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to