You shouldn't need to use std::wstring, and in fact, you shouldn't.
Whether you can use std::string depends on whether std::string is
capable of understanding UTF-8. That may depend on the implementation.
(Or not. I haven't used any implementation of the class myself.)
Note that UTF-8 *is* Unicode. ("UTF" stands for "Unicode Transformation
Format.") Perhaps you are asking how to transcode between UTF-8 and
UTF-16, which is the encoding Xerces uses internally. (UTF-16 is also
the native format of Windows NT-based operating systems, and is
unfortunately referred to by Microsoft as "Unicode." This perpetuates
the confusion between Unicode and its encodings.) If so, there are many
messages in the mailing list archives that address this question. See,
for example, http://marc.info/?l=xerces-c-users&m=119514889329902&w=2.
The archives are listed at
http://xerces.apache.org/xerces-c/mailing-lists.html.
-----Original Message-----
From: Anna Simbirtsev [mailto:[EMAIL PROTECTED]
Sent: Friday, September 19, 2008 11:21 AM
To: [email protected]
Subject: Re: Problems with xerces-c version 1.7.0 and UTF-8
Also do I need to use std::wstring to store UTF-8 strings or I will be
ok with std::string?
Thank you
On Fri, 2008-09-19 at 09:40 -0400, Anna Simbirtsev wrote:
> Hi,
>
> Do you know if you can give me an example of how to transcode utf-8
> string to unicode and back? I think if I get the string in utf-8
> encoding, I need to convert it to unicode before I pass it into xerces
> parser?
>
> On Wed, 2008-09-17 at 09:58 -0700, David Bertoni wrote:
> > Anna Simbirtsev wrote:
> > > When I print it in hex format, I get
> > > : 0xffffffd0
> > > : 0xffffffb1
> > > : 0xffffffd0
> > > : 0xffffffb1
> > > : 0xffffffd0
> > > : 0xffffffb1
> > >
> > > Which I am not even sure what format, but maybe my shell does not
> > > know what it is.
> > You need to understand the limitations of any library you use. Here
is
> > a snippet of the source code from the domtools library you're using:
> >
> > string domtools::toString(const DOMString s)
> > {
> > char * t = s.transcode();
> > if (!t) return "";
> > string tmp = t;
> > delete [] t;
> > return tmp;
> > }
> >
> > You can see the call to DOMString::transcode(). This will fail when
> > characters in the DOMString are not representable in the local code
> > page. This is likely what's happening, and I suggest you find
another
> > library to use, because this one is broken.
> >
> > Alternately, if you always want to transcode data to UTF-8, you can
> > modify the library to use a UTF-8 transcoder. There was another
thread
> > late last week and this week on this topic.
> >
> > Dave
>