On Thu, 24 May 2001, Andrew Dunbar wrote:

 Hi Andrew,


> I've noticed Korean problems in several areas for a while now and
> have finally got around to investigating.  It turns out that libiconv
> has completely broken code for KSC_5601!
> 
> Here's an exceprt from the unicode conversion data from their site:
> 
> #    Format: Three tab-separated columns
> #        Column #1 is the Unified Hangeul code (in hex)
> #        Column #2 is the Unicode (in hex as 0xXXXX)
> #        Column #3 is the Unicode name (follows a comment sign, '#')
> #
> 0x8141        0xAC02  # HANGUL SYLLABLE KIYEOK-A-SSANGKIYEOK
> <much more snipped>
> 
> This means 0x81 0x41 is a correct multibyte sequence which should
> be converted into the sixteen bit value 0xAC02.

> Here's an exceprt from libiconv ksc5601.h:
> 
> static int
> ksc5601_mbtowc (conv_t conv, wchar_t *pwc, const unsigned char *s, int
> n)
> {
>   unsigned char c1 = s[0];
>   if ((c1 >= 0x21 && c1 <= 0x2c) || (c1 >= 0x30 && c1 <= 0x48) || (c1 >=
> 0x4a && c1 <= 0x7d)) {
>     <much code snipped>
>   }
>   return RET_ILSEQ;
> }

 Yes, it seems to be broken. May be 's' should point right to the
byte after 0x81, i.e. to 0x41, when this function is called?
 
> You'll see that our very first byte, 0x81, does not pass the very first
> test!
> 
> Something is very wrong here.  To check for yourself try loading any
> Korean plain text file when using a Korean locale and compare with
> another
> program which also handles Korean encoded files.  Saving is also broken
> as is input and anything that treats Korean as multibyte.

  Of course it would be nice to report this to libiconv's maintainer.
 
 Best regards,
  -Vlad


Reply via email to