Re: Encoding conversions

Michael B. Allen Sun, 09 Sep 2001 20:54:13 -0700
On Sun, Sep 09, 2001 at 05:31:27PM +0200, Bruno Haible wrote:
> > > You cannot even assume that. wchar_t is locale dependent and
> > > OS/compiler/vendor dependent. It should never be used for "binary file
> > > formats and network messages".
> > 
> > Well, I have to normalize to something!
> 
> wchar_t is a very wrong thing to normalize to, because it is OS and
> locale dependent. UTF-8 is a much better normalization for strings,
> both in-memory and on disk. UCS-4 is an alternative, good
> normalization for strings in memory.

Well, then what's it good for?

Maybe we misunderstand each other. Perhaps if I tell you exactly what
I'm trying to do you can just tell me how I should do it?

I want to encode and decode binary data from sockets and files
(streams). Because serializing and deserializing integers and strings is
fundamental to these problems I have written a very light weight peice
of code designed specifically to abstract this functionality. I have
placed the work at the URL below if you care to examine it:

  http://auditorymodels.org/encdec/

I would like the code to be as general as possible. For one project
(an SMB/CIFS server) I will be decoding and encoding many UCS-2LE (or
UTF-16LE, not sure) strings. Another intrest of mine is the MS Word file
binary format which has a slew of different string types potentially
mixed into the same document.

I thought that normalizing strings to wchar_t was the wise choice because
I could take advanage of the existing string manipulation functions
like wcslen, wcsstr, etc (Actually, I believe someone on this list
instructed my to use wchar_t regarding a similar question). But now I
should use UTF-8?

In light of the detail above, can you tell me what the ideal solution
to this problem would be?

Mike

-- 
Wow a memory-mapped fork bomb! Now what on earth did you expect? - lkml
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
Re: Encoding conversions

Reply via email to