Greetings! Waldek Hebisch <de...@fricas.org> writes:
> On Fri, May 30, 2025 at 05:08:21PM +0200, Grégory Vanuxem wrote: >> >> However, I noticed that about utf-8 support: >> >> In SBCL or Clozure CL, the later needs a special routine to handle utf-8 >> string returned from the C wrapper: >> >> (1) -> #"Hello world, Καλημέρα κόσμε, コンニチハ" >> >> (1) 34 >> >> Whereas in gcl27, 2.7.1-7 (the Debian sid package): >> >> (2) -> #"Hello world, Καλημέρα κόσμε, コンニチハ" >> >> (2) 59 >> >> # is the operation that returns the number of elements in an Aggregate. >> >> So I think handling strings returned by Julia, I use them sometimes for >> formatting purposes or regular expressions related operation principally, >> will probably become difficult with gcl27, they are in utf-8. For BLAS and >> LAPACK and purely numerical functions I don't think it will it be a problem >> but for returned strings (char *) I wonder if there is a special function >> to let GCL "knows" it is in utf-8 and handle them correctly. >> > > Nice things about utf-8 is that in most cases code expecting > 8-bit characters will handle them correctly, so the only > thing which needs to know about utf-8 is input and display > subsystem. In particular, for regexes, in most cases you > should be able to pass utf-8 strings to 8-bit regex engine > and obtain correct result. > Yes this has essentially allowed GCL to stick with the traditional character definition to this point. Its just character counting that's missing. > AFAICS number 34 above is useless for formatting purposes. > 59 looks like correct number of bytes, which is crucial for > manipulating on the string using low-level operations. > For screen positioning you need number of positions on the > screen, which seem to be 39. To know this you need to > know a lot of specific thing, like which characters are > double width, which are combining (so does not need their > own position). > Interesting point. So its just the international user who wants (aref utf-string 23). Nonetheless GCL will eventually support this. Current thoughts are along the lines of emacs' implementation. There was a discussion of this on gcl-devel not too long ago. > BTW: Clef has handle display issues, so I wrote a few support > routines for handling utf-8 (see 'dist_left' and 'dist_right' in > 'src/clef/e_buf.c'). They use Clef representation for buffer > but idea should be clear. Will check it out -- thanks! Take care, > > -- > Waldek Hebisch -- Camm Maguire c...@maguirefamily.org ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah