Package: librecode0
Version: 3.6-12
Severity: normal
According to the info page, recode_perform_task() should return the
error code RECODE_UNTRANSLATABLE in task->error_so_far if the input
contains characters that cannot be represented in the output charset.
However, it returns RECODE_INVALID_INPUT when trying to translate
certain chars from utf8 to latin1, even if the input is valid utf8.
Below's an example C program that show the bug. It tries to translate
the string "á ç α ζ" from utf8 into latin1. The á and ç work fine,
but it chokes on the alpha (as it should, because latin1 doesn't
contain an alpha). However, the error code it returns is 4
(==RECODE_INVALID_INPUT) instead of 3 (==RECODE_UNTRANSLATABLE).
This bug obviously makes it impossible to distinguish between invalid
inputs (which, in a user application, should throw an error) or
characters that simple cannot be represented in the desired charset
(which could be replaced by a ? for example).
#include <stdio.h>
#include <stdbool.h>
#include <recodext.h>
#include <string.h>
int
main ()
{
/* utf8 test string: 2 chars ('a, ,c) representable in latin1,
* followed by 2 chars (alpha, zeta) that cannot be represented
* in latin1 */
char greek_utf_str[] = "\303\241 \303\247 \316\261 \316\266";
char buf[100] = "";
RECODE_OUTER outer = recode_new_outer (false);
RECODE_REQUEST request = recode_new_request (outer);
RECODE_TASK task;
bool success;
recode_scan_request (request, "utf-8..latin1");
task = recode_new_task (request);
task->input.buffer = &(greek_utf_str[0]);
task->input.cursor = task->input.buffer;
task->input.limit = task->input.buffer + sizeof(greek_utf_str);
task->output.buffer = &(buf[0]);
task->output.cursor = task->output.buffer;
task->output.limit = task->output.buffer + sizeof(buf);
success = recode_perform_task (task);
printf("task completed with error %i\n", task->error_so_far);
printf("output buffer: ");
while (task->output.buffer < task->output.cursor) {
printf("%02X ", (unsigned char) *(task->output.buffer++));
}
printf("\n");
return 0;
}
-- System Information:
Debian Release: testing/unstable
APT prefers unstable
APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.14.3
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Versions of packages librecode0 depends on:
ii libc6 2.3.5-6 GNU C Library: Shared libraries an
librecode0 recommends no packages.
-- no debconf information