Hello,
I was looking at glibc's scanf behaviour regarding locale-specific
digit grouping (using the `'' flag) and am interested in feedback
before I file a bug report. In particular, I'd like to hear some
thoughts regarding the questions of enforced/ignored grouping and
handling of 0-padding, as illustrated by tests 2, 3, and 4 below.
A digit grouping flag for scanf seems to be a glibc extension, as
I can't find any standards document describing one. (If there is
such a document, or even a working draft, I'd love to know.) The
glibc manual states
For all the above number parsing formats there is an additional
optional flag `''. When this flag is given the `scanf' function
expects the number represented in the input string to be formatted
according to the grouping rules of the currently selected locale.
which seems to clearly state that digit grouping rules (if they exist)
will be enforced. But this isn't what happens in practice.
Regarding the output below for the en_US locale:
All Tests:
Clearly it is incorrect for all input to be consumed by these
tests, with the possible exception of (2) if grouping is not
strictly enforced (see below).
Tests 1:
The ANSI/ISO C99 standard states that the fscanf function pushes
back at most one input character onto the input stream. The behavior
of sscanf should be analogous. Hence, I would have expected sscanf
to return 0 since it could not push back both `,' and `x'.
Tests 2 and 3:
The result for (3) is obviously due to a bug. But the general
question is, should grouping be enforced or ignored if the initial
digit block appears to violate the grouping rules? Test 2 shows
that glibc does not enforce grouping in this case.
Test 4:
The result appears to suffer from the same bug as Test 3. But there
is an additional issue raised here. Consider that the test string
could be the output of printf("%0'13ldx", 12345L); in the en_US
locale. Since grouping takes place before 0-padding, should leading
zeros be ignored when checking for correct grouping? Should the
value of `l' be 0, 12, or 12345?
Personally, I think that grouping should be enforced. So I would have
expected Tests 2 and 3 to set `l' to 123 since grouping was explicitly
resquested. But I'm not so sure about how to handle 0-padding. I can
see arguments for either 0 or 12345.
Thanks for any feedback,
Manuel
--------------------- test output --------------------------------
locale: C grouping: 0
Test 0 - "12,3456x" l = 12 : c = ',' : n= 2 ",3456x"
Test 1 - "12,345,x" l = 12 : c = ',' : n= 2 ",345,x"
Test 2 - "12345x" l = 12345 : c = 'x' : n= 5 "x"
Test 3 - "12345,678x" l = 12345 : c = ',' : n= 5 ",678x"
Test 4 - "000000012,345x" l = 12 : c = ',' : n= 9 ",345x"
locale: en_US grouping: 3 3 0
Test 0 - "12,3456x" l = 12345 : c = 'x' : n= 7 "x"
Test 1 - "12,345,x" l = 12345 : c = 'x' : n= 7 "x"
Test 2 - "12345x" l = 12345 : c = 'x' : n= 5 "x"
Test 3 - "12345,678x" l = 1234 : c = 'x' : n= 9 "x"
Test 4 - "000000012,345x" l = 1 : c = 'x' : n=13 "x"
-------------------------- test code ------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <limits.h>
static const char *locales[] = {
"C", "en_US", NULL
};
static const char *tests[] = {
"12,3456x",
"12,345,x",
"12345x",
"12345,678x",
"000000012,345x",
NULL
};
int main(void)
{
long int l;
const char **pl;
const char **pt;
struct lconv *lc;
const char *g;
int n, r;
char c[1];
for (pl = locales ; *pl ; pl++) {
if (!setlocale(LC_NUMERIC, *pl)) {
printf("\nsetlocale for %s failed!\n", *pl);
continue;
}
lc = localeconv();
printf("\nlocale: %s grouping:", *pl);
g = lc->grouping;
do {
printf(" %d", *g);
if ((*g == 0) || (*g == CHAR_MAX)) {
break;
}
++g;
} while (1);
printf("\n");
for (pt = tests ; *pt ; pt++) {
printf(" Test %2d - %*s\"%s\" l = ",
(pt - tests), 15-strlen(*pt), "", *pt);
n = 0;
r = sscanf(*pt, "%'ld%n%c", &l, &n, c);
if (r < 1) {
printf("failed\n");
} else if (r == 1) {
printf("%8ld : c unread : n=%2d \"%s\"\n", l, n, *pt + n);
} else {
printf("%8ld : c = '%c' : n=%2d \"%s\"\n", l, *c, n, *pt + n);
}
}
}
printf("\n");
return EXIT_SUCCESS;
}
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/