Hello,

I was looking at glibc's scanf behaviour regarding locale-specific
digit grouping (using the `'' flag) and am interested in feedback
before I file a bug report.  In particular, I'd like to hear some
thoughts regarding the questions of enforced/ignored grouping and
handling of 0-padding, as illustrated by tests 2, 3, and 4 below.

A digit grouping flag for scanf seems to be a glibc extension, as
I can't find any standards document describing one.  (If there is
such a document, or even a working draft, I'd love to know.) The
glibc manual states

   For all the above number parsing formats there is an additional
   optional flag `''.  When this flag is given the `scanf' function
   expects the number represented in the input string to be formatted
   according to the grouping rules of the currently selected locale.

which seems to clearly state that digit grouping rules (if they exist)
will be enforced.  But this isn't what happens in practice.

  Regarding the output below for the en_US locale:

  All Tests:
    Clearly it is incorrect for all input to be consumed by these
    tests, with the possible exception of (2) if grouping is not
    strictly enforced (see below).
  
  Tests 1:
    The ANSI/ISO C99 standard states that the fscanf function pushes
    back at most one input character onto the input stream.  The behavior
    of sscanf should be analogous.  Hence, I would have expected sscanf
    to return 0 since it could not push back both `,' and `x'.

  Tests 2 and 3:
    The result for (3) is obviously due to a bug.  But the general
    question is, should grouping be enforced or ignored if the initial
    digit block appears to violate the grouping rules?  Test 2 shows
    that glibc does not enforce grouping in this case.
  
  Test 4:
    The result appears to suffer from the same bug as Test 3.  But there
    is an additional issue raised here.  Consider that the test string
    could be the output of   printf("%0'13ldx", 12345L);  in the en_US
    locale.  Since grouping takes place before 0-padding, should leading
    zeros be ignored when checking for correct grouping?  Should the
    value of `l' be 0, 12, or 12345?

Personally, I think that grouping should be enforced.  So I would have
expected Tests 2 and 3 to set `l' to 123 since grouping was explicitly
resquested.  But I'm not so sure about how to handle 0-padding.  I can
see arguments for either 0 or 12345.

Thanks for any feedback,

Manuel

--------------------- test output  --------------------------------

locale: C   grouping: 0
  Test  0 -        "12,3456x"  l =       12 : c = ',' : n= 2 ",3456x"
  Test  1 -        "12,345,x"  l =       12 : c = ',' : n= 2 ",345,x"
  Test  2 -          "12345x"  l =    12345 : c = 'x' : n= 5 "x"
  Test  3 -      "12345,678x"  l =    12345 : c = ',' : n= 5 ",678x"
  Test  4 -  "000000012,345x"  l =       12 : c = ',' : n= 9 ",345x"

locale: en_US   grouping: 3 3 0
  Test  0 -        "12,3456x"  l =    12345 : c = 'x' : n= 7 "x"
  Test  1 -        "12,345,x"  l =    12345 : c = 'x' : n= 7 "x"
  Test  2 -          "12345x"  l =    12345 : c = 'x' : n= 5 "x"
  Test  3 -      "12345,678x"  l =     1234 : c = 'x' : n= 9 "x"
  Test  4 -  "000000012,345x"  l =        1 : c = 'x' : n=13 "x"

-------------------------- test code ------------------------------

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <limits.h>

static const char *locales[] = {
    "C", "en_US", NULL
};

static const char *tests[] = {
    "12,3456x",
    "12,345,x",
    "12345x",
    "12345,678x",
    "000000012,345x",
    NULL
};

int main(void)
{
    long int l;
    const char **pl;
    const char **pt;
    struct lconv *lc;
    const char *g;
    int n, r;
    char c[1];

    for (pl = locales ; *pl ; pl++) {
        if (!setlocale(LC_NUMERIC, *pl)) {
            printf("\nsetlocale for %s failed!\n", *pl);
            continue;
        }

        lc = localeconv();

        printf("\nlocale: %s   grouping:", *pl);
        g = lc->grouping;
        do {
            printf(" %d", *g);
            if ((*g == 0) || (*g == CHAR_MAX)) {
                break;
            }
            ++g;
        } while (1);
        printf("\n");

        for (pt = tests ; *pt ; pt++) {
            printf("  Test %2d - %*s\"%s\"  l = ",
                   (pt - tests), 15-strlen(*pt), "", *pt);
            n = 0;
            r = sscanf(*pt, "%'ld%n%c", &l, &n, c);
            if (r < 1) {
                printf("failed\n");
            } else if (r == 1) {
                printf("%8ld : c unread : n=%2d \"%s\"\n", l, n, *pt + n);
            } else {
                printf("%8ld : c = '%c' : n=%2d \"%s\"\n", l, *c, n, *pt + n);
            }
        }

    }
    printf("\n");

    return EXIT_SUCCESS;
}

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to