Dnia czw 3. lipca 2003 18:17, Beni Cherniavsky napisał:
> How optimized should it be?
Like Lisp or Smalltalk, i.e. quite optimized as for a dynamically typed
language, but not as much as usually are statically typed languages.
> How much custom runtime libraries are you ready to write and use
> (vs. compiling to calls to standard libraries)?
I'm prepared to interface to iconv and such, and to different I/O systems on
different platforms. I don't want to implement conversions from scratch.
Character predicates and string handling will be implemented from stratch.
I'm more afraid of requiring headaches from other people trying to interface
to C libraries.
> Are you ready to require sane i18n libraries (GNU libc, libiconv,
> etc.) or do you want to work out-of-the-box with any unix vendor's
> half-baked i18n facilities?
It will probably have rich i18n support if libraries are available. With just
ANSI C available it will fall back to minimal support for a few encodings.
I'm not sure what are interesting things that these libraries provide, besides
recoding and character predicates.
> The last question repeated but more
> amplified for windows - how well do you want to work with
You haven't finished the sentence.
> Why can't the source always be in UTF-8?
Because people often use other encodings. I use ISO-8859-2 almost exclusively
because there are too few UTF-8-capable editors.
> I'm not aware of any Windows
> editors that can handle UTF-{16,32}{,BE,LE} but not UTF-8. Or do you
> refer to allowing unibyte encodings (latin-*, cp-*)?
Yes, I didn't mean UTF-{16,32} but unibyte encodings.
> 2.1. Strings are internally in UTF-8 but the programmer is presented
> with an illusion of character-level indexing. This can be
> sub-optimal with some access patterns so you might want to
> provide some kind of iteration abstraction to reduce the use of
> indexing.
I think it would be inefficient or horribly complicated. Ideas like caching
the last character-to-byte mapping used in a string are not an option - it's
way too ugly. I'm free to define what to index a string should mean, but it
should be consistent with the physical representation.
> If you write a recoding
> subsystem anyway, why would you ever want to touch UTF-16?
Because I think it's widely used in the Windows API. Unfortunately I don't
program on Windows at all.
--
__("< Marcin Kowalczyk
\__/ [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/