close 41657

No one else has commented therefore I am closing the bug ticket.  But
the discussion may continue here.

Michael Coleman wrote:
> Thanks very much for your prompt reply.  Certainly, if this is
> documented behavior, it's not a bug.  I would have never thought to
> check the documentation as the behavior seems so strange.

I am not always so generous about documented behavior *never* being a
bug. :-)

> If I understand correctly, the leading backslash in the first field
> is an indication that the second field is escaped.  (The first field
> never needs escapes, as far as I can see.)

Right.  But it was available to clue in the md5sum and others that the
file name was an "unsafe" file name and was going to be escaped there.

> Not sure I would have chosen this, but it can't really be changed
> now.  But, I suspect that almost no real shell script would deal
> with this escaping correctly.  Really, I'd be surprised if there
> were even one example.  If so, perhaps it could be changed without
> trouble.

Let's talk about the shell scripting part.  Why would this ever need
to be parsed in a shell script?  And if so then that is precisely
where it would need to be done due to the file name!

Your own example was a file name that consisted of a single
backslash.  Since the backslash is the shell escape character then
handling that in a shell script would require escaping it properly
with a second backslash.

I will suggest that the primary use for the *sum utility output is as
input to the same utility later to check the content for differences.
That's arguably the primary use of it.

There are also cases where we will want to use the *sum utilities on a
single file.  That's fine.  I think the problematic case here might be
a usage like this usage.

  sum=$(md5sum "$filename" | awk '{print$1}')
  printf "%s\n" "$sum"

And then there is that extra backslash at the start of the hash.
Well, yes, that is unfortunate.  But in this case we already have the
filename in a variable and don't want the filename from md5sum.  This
is very similar to portability problems between different versions of
'wc' and other utilities too.  (Some 'wc' utils print leading spaces
and some do not.)

As you already deduced if md5sum does not have a file name then it
does not know if it is escaped or not.  Reading standard input instead
doesn't have a name and therefore "-" is used as a placeholder as per
the tradition.

  sum=$(md5sum < "$filename" | awk '{print$1}')
  printf "%s\n" "$sum"

And because this is discussion I will note that the name is just one
of the possible names to a file.  Let's hard link it to a different
name.  And of course symbolic links are the same too.  A name is just
a pointer to a file.

  ln "$filename" foo
  md5sum foo
  d41d8cd98f00b204e9800998ecf8427e  foo

But I drift...

I think it likely you have already educated your people about the
problems and the solution was to read from stdin when the file name is
potentially untrusted "tainted" data.  (Since programming langauges
often refer to unknown untrusted data as "tainted" data for the
purpose of tracking what actions are safe upon it or not.  When taint
checking is enabled.)  Therefore if the name is unknown then it is
safer to avoid the name and use standard input.

And I suggest the same with other utilities such as 'wc' too.
Fortunately wc is not used to read back its own input.  Otherwise I am
sure someone would suggest that it would need the same escaping done
there too.  Example that thankfully does not actually exist:

  $ wc -l \\
  \0 \\

I am sure that if such a change were made that it would result in a
large wide spread breakage.  Let's hope that never happens.


Reply via email to