On Wed, 18 Sep 2002 22:26:40 +0100 (BST) Robert de Bath <[email protected]> wrote:
> On Wed, 18 Sep 2002, Michael B. Allen wrote: > > > Perhaps encdec. The interface is a little nicer for common situations. > > > > http://freshmeat.net/projects/encdec/ > > Nice, still not as 'smooth' as I'd hoped and I don't think there's any > support for 'character' counting as opposed to 'display cell' counting. The encdec package is not a "unicode conversion library". The other common misconception it to think it's a set of serialization primatives like XDR. It can be used very effectively in that way but that's incedental. Encdec's real function is to pick apart arbitary binary file formats and network messages. I have used it extensivly to decode and *encode* MS SMB, MS Word 97, MS Structured Storage compound documents (encodes DIRENTs as RB trees!), MS Enhanced Metafiles (EMF), TI coff images into PalmOS PDB files, ... etc. The point is that when doing this sort of thing you never know what you're going to run into. MS formats in particular will have a UCS-2LE pascal-ish string and then a cp1250 right next to it. There might be some field that's supposed to be N *number of characters* encoded in some array. Yes, this somewhat rare but it does happen and in my experiance it is not very common to limit by display positions when doing this kind of work either. At the time my reasoning was that it is safer to model the concept of a string as a sequence of characters (see sig) and I still believe that would be ideal if it did not incur unacceptable performance limits. The encdec string interface is designed to be as open ended as possible. You can limit by source bytes, destination bytes, and character count. You can use -1 for all and stop at '\0' or use all limits or some and not others. I might change that cn limit to a pn in a future version but portability is more of an issue at the moment as it requires __STDC_ISO_10646__. Is libiconv capable of doing wchar_t, UCS-4, and UTF-8 operations on Windows? I couldn't even build it (although I didn't try very hard). -- A program should be written to model the concepts of the task it performs rather than the physical world or a process because this maximizes the potential for it to be applied to tasks that are conceptually similar and more importantly to tasks that have not yet been conceived. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
