Your message dated Sun, 15 Apr 2007 01:25:44 +0200 with message-id <[EMAIL PROTECTED]> and subject line Bug#372515: iconv(): Returns EILSEQ when it can't convert to the output encoding. has caused the attached Bug report to be marked as done.
This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what I am talking about this indicates a serious mail system misconfiguration somewhere. Please contact me immediately.) Debian bug tracking system administrator (administrator, Debian Bugs database)
--- Begin Message ---Package: libc6 Version: 2.3.6-15 Severity: important Hi, It seems that iconv() return -1 and sets errno to EILSEQ on valid input that it can't convert to the output encoding. It shouldn't be doing that, since it is valid input. This can be simple showed using the iconv util, since it reacts the same. An simple latin1 file: $ cat test.txt tést $ iconv -f latin1 -t ASCII test.txt > /dev/null iconv: illegal input sequence at position 1 $ iconv -f latin1 -t UTF-8 test.txt > /dev/null $ >From the manpage: EILSEQ An invalid multibyte sequence has been encountered in the input. >From Single Unix Specification 3: [EILSEQ] Input conversion stopped due to an input byte that does not belong to the input codeset. It also says: If iconv() encounters a character in the input buffer that is valid, but for which an identical character does not exist in the target codeset, iconv() shall perform an implementation-defined conversion on this character. Instead of doing an "implementation-defined conversion", it's returning an error, and saying the input is invalid, while the input is clearly valid. I would rather have that it actually follows the standard, and does some conversion, even if it just turns it in a '?' or something. Kurt
--- End Message ---
--- Begin Message ---On Fri, Jun 09, 2006 at 10:12:26PM +0200, Kurt Roeckx wrote: > Package: libc6 > Version: 2.3.6-15 > Severity: important > > Hi, > > It seems that iconv() return -1 and sets errno to EILSEQ on valid > input that it can't convert to the output encoding. It shouldn't be > doing that, since it is valid input. > > This can be simple showed using the iconv util, since it reacts > the same. An simple latin1 file: > $ cat test.txt > tést > $ iconv -f latin1 -t ASCII test.txt > /dev/null > iconv: illegal input sequence at position 1 > $ iconv -f latin1 -t UTF-8 test.txt > /dev/null > $ > > >From the manpage: > EILSEQ An invalid multibyte sequence has been encountered in the input. > > >From Single Unix Specification 3: > [EILSEQ] > Input conversion stopped due to an input byte that does not > belong to the input codeset. > > It also says: > If iconv() encounters a character in the input buffer that is > valid, but for which an identical character does not exist in the > target codeset, iconv() shall perform an implementation-defined > conversion on this character. > > Instead of doing an "implementation-defined conversion", it's > returning an error, and saying the input is invalid, while the > input is clearly valid. I would rather have that it actually > follows the standard, and does some conversion, even if it just > turns it in a '?' or something. By default, iconv is strict, and won't silentely replace bad chars. If you want it to perform approximations, you can as for ascii//TRANSLIT rather than ascii, or ascii//IGNORE to ignore untranslateable characters. If you want a less strict tool, recode is what you seek. iconv is meant to be a very strict one, to check the validity of an encoded string e.g. (inconv -f utf8 -t utf8 < foo > /dev/null is a trick to verify a text is valid utf8 e.g.). -- ·O· Pierre Habouzit ··O [EMAIL PROTECTED] OOO http://www.madism.orgpgpEsHp1noWIp.pgp
Description: PGP signature
--- End Message ---