-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [please keep replies on the list]
According to James Strother on 12/20/2008 7:41 PM: > Hello Eric, > > I tried 6.12 and it no longer spills binary characters onto the screen, > but instead it scans through the entire file looking for valid MD5 lines > before producing the error than no valid MD5 lines were found. > > This is certainly better, but I expect that someone unfamiliar with md5sum > who made the same mistake that I did would find this behavior quite > perplexing. It would seem that md5sum is checking their file, but then > it wouldn't produce any output. (I only put this forth because I work in > a lab of linux-newbies and I field similar issues every day). > > I couldn't think of a situation in which non-printable text could appear > in a valid checksum file. If it is the case that non-printable text is > always invalid, then it's easy enough to check for it and produce a > specific error. Here is a patch that does it: Thanks for the proposed patch; but before we can even consider applying it, it would be better to reformulate it against the latest sources, per the suggestions in HACKING: http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=HACKING;h=6eb0480;hb=4000c35 Since you are proposing to affect user-visible behavior, you would also need to patch NEWS and the documentation, plus a testsuite addition would be welcome. At which point your contribution becomes non-trivial, so you should consider assigning copyright to the FSF. > > > diff -ru coreutils-6.12/src/md5sum.c coreutils-6.12-modified/src/md5sum.c > --- coreutils-6.12/src/md5sum.c 2008-05-25 23:40:33.000000000 -0700 > +++ coreutils-6.12-modified/src/md5sum.c 2008-12-20 > 18:25:19.000000000 -0800 > @@ -361,6 +361,21 @@ > return *s == '\0'; > } > > + > +/* Return true if S is a NUL-terminated string of printable characters. > + Otherwise, return false. */ > +static bool > +printable_chars (unsigned char const *s) > +{ > + while (isprint (*s) || isspace (*s)) This is susceptible to locale issues. Do you really want the error reporting to differ according to locale, or should this use the gnulib module c-ctype? > + { > + ++s; > + } For a one-liner statement, we typically omit the {}. > + > + return *s == '\0'; > +} > + > + > /* An interface to the function, DIGEST_STREAM. > Operate on FILENAME (it may be "-"). > > @@ -469,6 +484,13 @@ > if (line_length <= 0) > break; > > + /* Check if the line contains non-printable characters */ > + if (! printable_chars (line)) > + error (EXIT_FAILURE, 0, > + _("%s: %" PRIuMAX > + ": checksum file contains non-text characters"), > + checkfile_name, line_number); > + > /* Ignore comment lines, which begin with a '#' character. */ > if (line[0] == '#') > continue; > > > Of course, if there is a reasonable use case in which non-printable > characters > do appear, then this would be a bad idea and the current behavior would be > good enough. Is identifying non-printable characters the best heuristic for rejecting an input file as an invalid md5sum listing? Or is there a better heuristic worth applying? In one respect, the current behavior of silently ignoring invalid lines makes md5sum more like patch: by ignoring irrelevant lines until it identifies the data important to the program, it is much easier to run the program on an entire email than it is to extract just the relevant portion to feed to the tool. > > Cheers, > Jim > - -- Don't work too hard, make some time for fun as well! Eric Blake [email protected] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAklNymkACgkQ84KuGfSFAYC9/wCgicRhhdKONvqWKL3C3DaYZNHU U6MAnigOnyLmrGWwL2OhzFEhmLga+Puz =Vn0a -----END PGP SIGNATURE----- _______________________________________________ Bug-coreutils mailing list [email protected] http://lists.gnu.org/mailman/listinfo/bug-coreutils
