On 11/01/2013 06:53 PM, Pádraig Brady wrote: > On 11/01/2013 06:20 PM, Eric Blake wrote: >> On 11/01/2013 11:03 AM, Pádraig Brady wrote: >> >>>> >>>> Escape the output (marking with a leading '\' and backslash-escaping >>>> both '\' and '\n') only when the file name contains a newline. >>>> Before, we would do that for a file name containing either newline or >>>> backslash. >>>> >>>> This probably deserves a NEWS entry, since it is user-visible. >>> >>> I debated that as I thought it could have no impact on anything, >>> but it could actually if one was comparing old and new outputs? >>> >>> newsum=$(md5sum my file set | md5sum) >>> [ "$newsum" = "$(cat ./oldsum)" ] || error >> >> Not just that, but the new format is not necessarily parseable by older >> md*sum. Your patch didn't show (but probably should be enhanced) what >> happens for a file named 'a\nb'; pre-patch, it gave '\sum a\\nb', >> post-patch it gives 'sum a\nb' > > Right. > >> - but if the older utility assumes that >> the missing leading \ was a mistake and unescapes the file name, it >> results in looking for a file as 3 three-byte name "a<newline>b", which >> is also part of the user-visible change. > > Right but that's a big if. > So you're referring to non GNU utils parsing these checksum files, > and non honoring the leading \ escape marker. > That's quite unlikely I would think. > >> Breaking output so that older versions can't parse newer output has been >> one of the reasons that I have only threatened to patch \r handling, >> rather than actually doing it, because it's tricky to think about >> old/new interactions and what might break. Depending on how >> conservative we are trying to be, we may need to add a command line >> option that will let the user forcefully revert to the older-style >> output for intentional interaction with older checksum tools regardless >> of filename. For 99% of the cases, the output is identical, since files >> with \n or \\ in the name are already rare. Thinking aloud, it may be >> appropriate to have such a mode option be tri-state (old, new, or warn; >> with default being warn), where the warning mode gives the new output >> but ALSO flags to the user that their output may not be parseable by >> older summing utilities. > > Well any change here isn't worth a flag I think. > Even for \r one can always `tr -d '\r'` the DOS files before processing.
Or dos2unix to be careful to only process EOLs: $ printf 'a\rb\r\r\n' | dos2unix | od -tx1 0000000 61 0d 62 0d 0a > The only reason I was avoiding the redundant '\' escaping > was to avoid having to do the unescaping like in cleanup_sum() > here for example http://fslint.googlecode.com/svn/trunk/fslint/findup > But I suppose even that's not general. > > OK I think it's not worth changing the output format now, > given the possibility of non GNU tools parsing incorrectly, > and the edge case where the output is directly compared > to older output. > > I'll just do a maint commit to optimize/document at bit. Pushed the non user visible adjustment at: http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=4d94e65 cheers, Pádraig.
