Bruno Haible wrote: > Hi Pádraig, > >> So I'm wondering now why normalization functionality isn't in iconv? >> Seems like a big ommision to me. >
[snip valid points on iconv limitations] >> There is a mention of it here: >> http://www.archivum.info/i18n-disc...@opensolaris.org/2006-08/msg00004.html > > This page mentions that some vendor iconv don't even get > iconv_open ("UTF-8", "UTF-8") implemented right. You see how little you > can portably expect from iconv (unless you consider installing GNU libiconv). > >> Then I also noticed `uconv` which is in the "icu" package of fedora at least. >> To normalize text the following worked for me: >> uconv -x NFC < test.utf8 >> >> So ... uconv already has it. >> Do we really need another util in coreutils for this? > > ICU is certainly seminal, because it served as a testbed for the development > of Unicode. But I shudder when I see these library sizes (ICU 3.6 on x86): > > $ size libicu*.so.*.0 > text data bss dec hex filename > 10152037 116 0 10152153 9ae8d9 libicudata.so.36.0 > 1215645 21760 1396 1238801 12e711 libicui18n.so.36.0 > 34402 2524 36 36962 9062 libicuio.so.36.0 > 245797 4644 88 250529 3d2a1 libicule.so.36.0 > 34011 1232 4 35247 89af libiculx.so.36.0 > 101228 1264 8 102500 19064 libicutu.so.36.0 > 1093450 28360 6364 1128174 1136ee libicuuc.so.36.0 > > I cannot estimate how much of these 10 MB get actually loaded into a > process' working set. 10 MB - this is 11 times the size of GNU libiconv > with all its conversion tables! $ uconv -x NFC& $ sudo bin/ps_mem.py | grep uconv Private + Shared = RAM used Program 1.9 MiB + 788.0 KiB = 2.7 MiB uconv $ uconv -x NFC& $ sudo bin/ps_mem.py | grep uconv 912.0 KiB + 2.2 MiB = 3.1 MiB uconv (2) > The benefit of a reimplementation is that > - It implements only the required specifications, does not carry the > historical baggage of 10 years of ICU, hence smaller code and table > sizes. > - When you find a bug or limitation, you have higher chances of getting it > fixed. I don't doubt the usefulness of libiconv, though I'm still not sure another "normalization util" is required when uconv is availble. thanks again for all the info, Pádraig. _______________________________________________ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils