On Thu, 20 Aug 2020 09:35:40 -0700, Charles Mills wrote:
>I wonder if it might make sense to go UTF-32 even to disk, but compress the
>data.
>
>I wonder how well standard compression schemes work with UTF-32? Are they too
>octet-oriented to work optimally?
>
A non-scientific sample:
1995 $ ls -l ~ | wc
24 213 1403
1996 $ ls -l ~ | gzip | wc
1 9 441
1997 $ ls -l ~ | iconv -f UTF-8 -t UTF-32 | wc
24 213 5616
1998 $ ls -l ~ | iconv -f UTF-8 -t UTF-32 | gzip | wc
0 9 679
>I wonder if one might write an LZW implementation that assumed 32-bit
>characters.
-- gil
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN