You are trying to convert into Unicode something
that is not UTF-8, using the UTF-8 rules; I hope
you realize that this is simply wrong and could
either fail or give invalid results. I would try
to understand what is the encoding being used,
and use that to perform the conversion.
As for embedding an entire XML file into a CDATA,
that's fine unless the XML file contains CDATA
sections (as the XML rules prohibit the ]] sequence top appear inside a CDATA).
Alberto
At 06.00 21/05/2007 -0700, Mihai Matei wrote:
This is the message: "invalid byte 2 (() of a
3-byte sequence." I think it refers to an
extended-ASCII char, é (e acute in extended fr
ascii table). Granted, this is not an UTF-8
character. Could I detect these characters and
replace them with "?" ? One other question: What
happens if I want to place the contents of an
XML file in an CDATA section? will I have to
escape any characters so as not to break
Xerces's parsing? "[CDATA[", and "]]" spring to
mind. ----- Original Message ---- From: Alberto
Massari <[EMAIL PROTECTED]> To:
[email protected] Sent: Monday, May 21,
2007 1:10:02 PM Subject: Re: UTF-8 characters in
CDATA section At 05.01 21/05/2007 -0700, Mihai
Matei wrote: >Thanks for that. > >I'm getting an
XMLException when parsing the UTF-8
chars, >e.getMessage() is 40094298.
e.getMessage() returns a XMLCh* that you are
probably printing as a void*; use
XMLString::transcode and see what it the
message. Alberto >what should I set the
charSizes pointer to in this
function?: >
XMLUTF8Transcoder::transcodeFrom(const
XMLByte* >
const srcData >
, const unsigned
int srcCount >
, XMLCh*
const toFill >
, const unsigned
int maxChars >
, unsigned
int& bytesEaten >
, unsigned char*
const charSizes) > >At the moment, I'm doing
a unsigned char* charSizes = new >unsigned
char[len+1], but it does not
work. > >Regards, >Mihai Matei > >----- Original
Message ---- >From: Alberto Massari
<[EMAIL PROTECTED]> >To:
[email protected] >Sent: Monday, May 21,
2007 8:32:28 AM >Subject: Re: UTF-8 characters
in CDATA section > >At 17.59 19/05/2007 -0700,
Mihai Matei wrote: > >Can you point me to some
sample code doing this? Do I have
to > >recompile the library with this new
transcoder? > >No, the transcoder is already
part of the library. Try something like
this: > > XMLUTF8Transcoder tx(0,
512); > size_t len =
XMLString::stringLen(toTranscode); > const
XMLCh* unicode = new
XMLCh[len+1]; > unsigned int charsEaten=
0; > tx.transcodeFrom(toTranscode, len+1,
unicode, len+1,
charsEaten, >XMLTranscoder::UnRep_Throw); > >
... > > delete []
unicode; > >Alberto > > > >----- Original
Message ---- > >From: Alberto Massari
<[EMAIL PROTECTED]> > >To:
[email protected] > >Sent: Friday, May
18, 2007 10:46:40 AM > >Subject: Re: UTF-8
characters in CDATA section > > > >The X() macro
is a helper class that converts from the local
encoding > >to Unicode; if you have UTF-8 data,
you need to use instead the
UTF-8 > >transcoder. > > > >Alberto > > > >At
02.40 18/05/2007 -0700, Mihai Matei
wrote: > > >Hi, > > > > > >I'm trying to add the
attached file's contents to a CDATA section
in > > >an xml. It contains a few Unicode-UTF8
characters
from > > ><http://www.columbia.edu/kermit/utf8-t1
.html>http://www.columbia.ed > >
u/kermit/utf8-t1.html. > > >(you can view the
file with Firefox, set the Character Encoding
to > > >Unicode(UTF8)). > > > > > >//string
'text' has the contents; > > >//if I output it
to a file with ofstream, the UTF8 characters >
are preserved > > > > > >DOMElement* pText =
pDoc->createElement(
X(tag.c_str())); > > >DOMCDATASection* pCdata =
pDoc->createCDATASection(X(text.c_str())); > > >p
Text->appendChild(pCdata); > > >parent->appendChi
ld(pText); > > > > > >the resulting xml however
loses the UTF-8 characters. Is it the
X() > > >macro that is to blame, or can I set
other XML Document properties > > >so I keep my
UTF8
chars? > > > > > >Thanks. > > > > > > > > >Got a
little couch potato? > > >Check out
fun > > ><http://us.rd.yahoo.com/evt=48248/*http:
//search.yahoo.com/search > ?f >
r=oni_on_mail&p=summer+activities+for+kids&cs=bz>summer
> > >activities for
kids. > > > > > > > > > > > >Ready for the edge
of your
seat? > > ><http://us.rd.yahoo.com/evt=48220/*htt
p://tv.yahoo.com/>Check out > > >tonight's top
picks on Yahoo!
TV. > > > > > > > > > > > > > > > > > > > >______
_____________________________________________________________
> _________________Be > >a better Heartthrob.
Get better relationship answers from
someone > >who knows. Yahoo! Answers - Check it
out. > >http://answers.yahoo.com/dir/?link=list&s
id=396545433 > > > > > > > > > >_________________
___________________________________________________________________Be
>a better Heartthrob. Get better relationship
answers from someone >who knows. Yahoo! Answers
- Check it
out. >http://answers.yahoo.com/dir/?link=list&sid
=396545433
____________________________________________________________________________________
Need Mail bonding? Go to the Yahoo! Mail Q&A for
great tips from Yahoo! Answers users.
http://answers.yahoo.com/dir/?link=list&sid=396546091