Re: How do I use Xerces strings?

Steven T. Hatton Tue, 07 Mar 2006 23:06:30 -0800

On Wednesday 08 March 2006 01:36, Scott Cantor wrote:
> > <quote url=http://doc.trolltech.com/4.1/qstring.html>
> > The QString class provides a Unicode character string.
> > QString stores a string of 16-bit QChars, where each QChar
> > stores one Unicode 4.0 character.
>
> That sounds like UTF-16, but it's not using the terminology that would give
> me warm fuzzies. There are, I believe, other 16-bit encodings, but I could
> be mistaken about that. It's certainly worth a try, but Unicode is one of
> those things where you'd need to try the hard stuff before you'd hit the
> problems.
>
> > So, if everybody is telling the truth, can I not just do this?
> >
> > const XMLCh* QtoX(const QString& s) {
> >   return reinterpret_cast<const XMLCh*>(s.constData());
> > }
> >
> > const XMLCh* CtoX(const char* cs) { return QtoX(cs); }
>
> Absent memory management issues, it's possible, yeah.
>
> I'm not sure if that second function would work though. It seems like
> you're counting on some auto-conversion via QString to convert the ASCII,
> and then returning a cast of its internal buffer. That's a recipe for crash
> city, I would think (temp object created, reference passed, pointer to
> internals returned, object destroyed, pointer invalid).


I believe the second function is superfluous.  Yes, I am depending on a 
conversion from const char* to QString.  As for the pointer becoming invalid, 
that probably depends on what Xerces does with what I pass.  If the first 
thing it does is copy it, then I'm home free.  Or, at least that is my 
understanding of the C++ Standard.  I'm not sure exactly where the overhead 
from transcoding comes in.  It looks like the transcoder will look at each 
character individually to determine if a conversion needs to happen.  That 
would be far more expensive than a simple "Trust me. I know this is properly 
encoded".  IIRC, there /are/ different UTF encodings, even within UTF-16.  
There is something called UCS-4, and also something called UCS-2 (I believe).  
I do not know the difference between these and their related UTF-32 and 
UTF-16.

Another potential source of overhead is memory allocation.  Qt uses a shared 
memory model for QString, but I don't know what that will buy me.

Steven

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How do I use Xerces strings?

Reply via email to