Re: Strings in a programming language

Marcin 'Qrczak' Kowalczyk Thu, 03 Jul 2003 09:52:44 -0700

Dnia czw 3. lipca 2003 18:17, Beni Cherniavsky napisał:

> How optimized should it be?


Like Lisp or Smalltalk, i.e. quite optimized as for a dynamically typed 
language, but not as much as usually are statically typed languages.

> How much custom runtime libraries are you ready to write and use
> (vs. compiling to calls to standard libraries)?

I'm prepared to interface to iconv and such, and to different I/O systems on 
different platforms. I don't want to implement conversions from scratch.
Character predicates and string handling will be implemented from stratch.

I'm more afraid of requiring headaches from other people trying to interface 
to C libraries.

> Are you ready to require sane i18n libraries (GNU libc, libiconv,
> etc.) or do you want to work out-of-the-box with any unix vendor's
> half-baked i18n facilities?

It will probably have rich i18n support if libraries are available. With just 
ANSI C available it will fall back to minimal support for a few encodings.

I'm not sure what are interesting things that these libraries provide, besides 
recoding and character predicates.

> The last question repeated but more
> amplified for windows - how well do you want to work with

You haven't finished the sentence.

> Why can't the source always be in UTF-8?

Because people often use other encodings. I use ISO-8859-2 almost exclusively 
because there are too few UTF-8-capable editors.

> I'm not aware of any Windows
> editors that can handle UTF-{16,32}{,BE,LE} but not UTF-8.  Or do you
> refer to allowing unibyte encodings (latin-*, cp-*)?

Yes, I didn't mean UTF-{16,32} but unibyte encodings.

> 2.1. Strings are internally in UTF-8 but the programmer is presented
>      with an illusion of character-level indexing.  This can be
>      sub-optimal with some access patterns so you might want to
>      provide some kind of iteration abstraction to reduce the use of
>      indexing.

I think it would be inefficient or horribly complicated. Ideas like caching 
the last character-to-byte mapping used in a string are not an option - it's 
way too ugly. I'm free to define what to index a string should mean, but it 
should be consistent with the physical representation.

> If you write a recoding
> subsystem anyway, why would you ever want to touch UTF-16?

Because I think it's widely used in the Windows API. Unfortunately I don't 
program on Windows at all.

-- 
   __("<         Marcin Kowalczyk
   \__/       [EMAIL PROTECTED]
    ^^     http://qrnik.knm.org.pl/~qrczak/

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Strings in a programming language

Reply via email to