Shalom, fine folks. -- Short story: --
When ripping Hebrew CDs the data I get from CDDB (or freeCDDB, I can't tell), data encoded with Aleph as 0xC3A0, Bet as 0xC3A1 and so on. -- Longer story: -- I was able to convert it into proper utf8 [Aleph as (d7,90)] only via the pipeline: ... | iconv -f utf8 -t unicode | sed 's/\x0//g' | iconv -c -f iso88598 -t utf8 That is: C3 A0 ==> `iconv -f utf8 -t unicode` ==> 00 E0 E0 hex = 224 dec # iso88598 , but for each byte I get an extra 00. So the next part: `sed 's/\x0//g'` discard the 00 bytes. Then the: `iconv -f iso8895 -tutf8` is a trivial step but without the `-c` it complains about illegal characters. -- Some background: -- LANG=en_US.utf8 # but I had no success with any other LANG value. LC_* is undefined LANGUAGE=en_US:en KDE 4.2,4 Kubuntu 9.04 English interface (the Hebrew interface in KDE4 is currently broken) -- Questions: -- 1. Is there an encoding where Aleph is 0xC3A0, if so what is it? If not how did I end up with this it? 2. Is there a less ugly way to get to from Aleph=0xC3A0 to proper UTF8? 3. Is this a bug, or a stupidity from my end? Thanks you for your attention. __ Cheers, Chen. _______________________________________________ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il