On Monday 14 December 2015 18:33:38 Eli Zaretskii wrote:
> > Date: Sun, 13 Dec 2015 20:04:31 +0100
> > From: "Andries E. Brouwer" <[email protected]>
> > Cc: "Andries E. Brouwer" <[email protected]>, [email protected]
> >
> > On Sun, Dec 13, 2015 at 08:01:27PM +0200, Eli Zaretskii wrote:
> > > If no one is going to pick up the gauntlet, I will sit down and do it
> > > myself, although I'm terribly busy with Emacs 25.1 release.
> >
> > Good!
>
> While working on this, I bumped into 2 related issues:
>
> 1. The functions that call 'iconv' (in iri.c) don't make a point of
> flushing the last portion of the converted URL after 'iconv'
> returns successfully having converted the input string in its
> entirety. IME, you need then to call 'iconv' one last time with
> either the 2nd or the 3rd argument set to NULL, otherwise
> sometimes the last converted character doesn't get output. In my
> case, some URLs converted from CP1255 to UTF-8 lost their last
> character. It sounds like no one has actually used this
> conversion in iri.c, except for trivially converting UTF-8 to
> itself. Is that possible/reasonable?
You are absolutely right.
Attached is a small test C code that shows (and fixes) the problem.
Regards, Tim
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <iconv.h>
int main(void)
{
// const char *src_encoding = "iso-8859-1";
// const char *dst_encoding = "UTF-8";
// const char *src = "hallö";
const char *src_encoding = "CP1255";
const char *dst_encoding = "UTF-8";
const char *src = "\xF9._\xF9\xF4\xF8\xE4";
iconv_t cd;
int ret = 1;
cd = iconv_open(dst_encoding, src_encoding);
if (cd != (iconv_t)-1) {
char *tmp = (char *) src; // iconv won't change where src points to, but changes tmp itself
size_t tmp_len = strlen(src);
size_t dst_len = tmp_len * 6, dst_len_tmp = dst_len;
char *dst = malloc(dst_len + 1), *dst_tmp = dst;
if (iconv(cd, &tmp, &tmp_len, &dst_tmp, &dst_len_tmp) != (size_t)-1) {
printf("inbytesleft=%zd outbytesleft=%zd\n", tmp_len, dst_len_tmp);
printf("converted '%s' (%s) -> '%.*s' (%s)\n", src, src_encoding, (int) (dst_len - dst_len_tmp), dst, dst_encoding);
/* kick out a possible 'shift sequence', else we may lose a last character stuck in cd */
if (iconv(cd, NULL, NULL, &dst_tmp, &dst_len_tmp) != (size_t)-1) {
printf("inbytesleft=%zd outbytesleft=%zd\n", tmp_len, dst_len_tmp);
printf("converted '%s' (%s) -> '%.*s' (%s)\n", src, src_encoding, (int) (dst_len - dst_len_tmp), dst, dst_encoding);
}
ret = 0;
} else
fprintf(stderr, "Failed to convert '%s' string into '%s' (%d)\n", src_encoding, dst_encoding, errno);
free(dst);
iconv_close(cd);
} else
fprintf(stderr, "Failed to prepare encoding '%s' into '%s' (%d)\n", src_encoding, dst_encoding, errno);
return ret;
}