Marcin 'Qrczak' Kowalczyk wrote on 2003-07-03: > Dnia czw 3. lipca 2003 18:17, Beni Cherniavsky napisa�: > > > How much custom runtime libraries are you ready to write and use > > (vs. compiling to calls to standard libraries)? > > I'm prepared to interface to iconv and such, and to different I/O systems on > different platforms. I don't want to implement conversions from scratch. > Character predicates and string handling will be implemented from stratch. > > I'm more afraid of requiring headaches from other people trying to interface > to C libraries. > Please elaborate - interface from what to which C libraries? Do you mean here people writing C extensions to your langauge?
> > Are you ready to require sane i18n libraries (GNU libc, libiconv, > > etc.) or do you want to work out-of-the-box with any unix vendor's > > half-baked i18n facilities? > > It will probably have rich i18n support if libraries are available. With just > ANSI C available it will fall back to minimal support for a few encodings. > > I'm not sure what are interesting things that these libraries provide, besides > recoding and character predicates. > > > The last question repeated but more > > amplified for windows - how well do you want to work with > > You haven't finished the sentence. > Indeed. I wanted to ask how well should it work with the different versions of windows, especially win9x with it's close-to-none Unicode support? > > 2.1. Strings are internally in UTF-8 but the programmer is presented > > with an illusion of character-level indexing. This can be > > sub-optimal with some access patterns so you might want to > > provide some kind of iteration abstraction to reduce the use of > > indexing. > > I think it would be inefficient or horribly complicated. Ideas like caching > the last character-to-byte mapping used in a string are not an option - it's > way too ugly. I'm free to define what to index a string should mean, but it > should be consistent with the physical representation. > OK. Then if you want at all to expose an interface to string indexing by unicode codepoints (but see srintuar26's comment), you are almost bound to use UTF-32. UTF-16 would have precisely the same indexing problems as UTF-8. > > If you write a recoding subsystem anyway, why would you ever want > > to touch UTF-16? > > Because I think it's widely used in the Windows API. Unfortunately I don't > program on Windows at all. > Neither do I, so don't take my comments too seriously ;). Indeed, AFAIK the only way to make unicode calls in Windows is to go through UTF-16 (and most these calls will break on win9x, curse M$). However UTF-16 is badly inconvenient internally. The question is between internal effeciency vs. OS call effeciency - make your choice. -- Beni Cherniavsky <[EMAIL PROTECTED]> "Reading the documentation I felt like a kid in a toy shop." -- Phil Thompson on Python's standard library -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
