Greetings!

Waldek Hebisch <de...@fricas.org> writes:

> On Fri, May 30, 2025 at 05:08:21PM +0200, Grégory Vanuxem wrote:
>> 
>> However, I noticed that about utf-8 support:
>> 
>> In SBCL or Clozure CL, the later needs a special routine to handle utf-8
>> string returned from the C wrapper:
>> 
>> (1) -> #"Hello world, Καλημέρα κόσμε, コンニチハ"
>> 
>>    (1)  34
>> 
>> Whereas in gcl27, 2.7.1-7 (the Debian sid package):
>> 
>> (2) -> #"Hello world, Καλημέρα κόσμε, コンニチハ"
>> 
>>    (2)  59
>> 
>> # is the operation that returns the number of elements in an Aggregate.
>> 
>> So I think handling strings returned by Julia, I use them sometimes for
>> formatting purposes or regular expressions related operation principally,
>> will probably become difficult with gcl27, they are in utf-8. For BLAS and
>> LAPACK and purely numerical functions I don't think it will it be a problem
>> but for returned strings (char *) I wonder if there is a special function
>> to let GCL "knows" it is in utf-8 and handle them correctly.
>> 
>
> Nice things about utf-8 is that in most cases code expecting
> 8-bit characters will handle them correctly, so the only
> thing which needs to know about utf-8 is input and display
> subsystem.  In particular, for regexes, in most cases you
> should be able to pass utf-8 strings to 8-bit regex engine
> and obtain correct result.
>

Yes this has essentially allowed GCL to stick with the traditional
character definition to this point.  Its just character counting that's
missing.

> AFAICS number 34 above is useless for formatting purposes.
> 59 looks like correct number of bytes, which is crucial for
> manipulating on the string using low-level operations.
> For screen positioning you need number of positions on the
> screen, which seem to be 39.  To know this you need to
> know a lot of specific thing, like which characters are
> double width, which are combining (so does not need their
> own position).
>

Interesting point.  So its just the international user who wants (aref
utf-string 23).

Nonetheless GCL will eventually support this.  Current thoughts are
along the lines of emacs' implementation.  There was a discussion of
this on gcl-devel not too long ago.

> BTW: Clef has handle display issues, so I wrote a few support
> routines for handling utf-8 (see 'dist_left' and 'dist_right' in
> 'src/clef/e_buf.c').  They use Clef representation for buffer
> but idea should be clear.

Will check it out -- thanks!

Take care,

>
> -- 
>                               Waldek Hebisch

-- 
Camm Maguire                                        c...@maguirefamily.org
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Reply via email to