On Thu, 20 Aug 2020 09:35:40 -0700, Charles Mills wrote:

>I wonder if it might make sense to go UTF-32 even to disk, but compress the 
>data.
>
>I wonder how well standard compression schemes work with UTF-32? Are they too 
>octet-oriented to work optimally?
> 
A non-scientific sample:
1995 $ ls -l ~ | wc
     24     213    1403
1996 $ ls -l ~ |                            gzip     | wc
      1       9     441
1997 $ ls -l ~ | iconv -f UTF-8 -t UTF-32 | wc
     24     213    5616
1998 $ ls -l ~ | iconv -f UTF-8 -t UTF-32 | gzip     | wc
      0       9     679

>I wonder if one might write an LZW implementation that assumed 32-bit 
>characters.

-- gil

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Reply via email to