Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

Robert Elz via austin-group-l at The Open Group Wed, 14 Apr 2021 09:16:12 -0700

    Date:        Tue, 13 Apr 2021 10:16:26 +0100
    From:        Harald van Dijk <a...@gigawatt.nl>
    Message-ID:  <7ab68758-b423-ae1b-4451-cd02c4b6b...@gigawatt.nl>


I think we have probably largely converged about this issue (with
Chet's assistance) so this will probably be my last message about it.

  | ...your hypothetical example is not one of them, IMO. There is nothing 
  | reasonable about saying that an activity that continues throughout the 
  | execution of the script counts as start-up activity.

Of course it could be, a start-up activity is an activity that commences
at start-up time, how long it takes to complete is not specified.   If the
text intended to say "might be added at shell initialisation" it could have
said that.

What matters here is that we have a hash table, that is effectively a cache,
which means things come, and go, more or less unpredictably (not so if you
look at the details of the implementation, but to an outside observer).
Eg: one possible thing would be for an implementation scanning PATH, to
open and read each directory in turn, stopping as soon as it finds the
command sought (appropriate perms etc) - and I mean, immediately (not
reading the rest of the directory currently being scanned) adding all
('x' permission allowed) regular files located during that scan to the
cache / hash-table.   That is, what gets added can depend upon the order
of files located in that final directory, which all depends upon how
the directory was created and how the implementation manages them.

Do remember that the standard is not legislation, it is an attempt to
specify what the implementations actually do, so readers know what they
can rely upon.

In this case, how the has table works is largely unknowable from outside
the implementation, so the standard is very wishy-washy about how things
get added, and what can be expected to be there or not be there, while
simultaneously alerting users to its likely presence, and the need to
deal with its ramifications in the odd case that something may change
which would invalidate cached data without the shell being aware that
has happened.

There is nothing better that is really possible - "the result of an
unspecified start-up activity" is just a way of saying "things might
happen that cannot be explained by anything else in here, deal with it".

  | Your interpretation 
  | would mean *all* activity can be considered start-up activity just by 
  | virtue of being performed on a different thread where that different 
  | thread was launched at shell startup,

Yes.   But the only thing that gets to vary here is what is in or not
in the cache, nothing else depends upon this.

  | and then of course as that is just 
  | an implementation detail invisible to a user of the shell, it doesn't 
  | actually have to be performed on a different thread.

Of course.

  | If that were the intent, why would the standard say "shell start-up
  | activity" in the first place?

I wasn't there when it was written, but I'd guess they were mostly
thinking of pre-seeding of the hash table from PATH ... but were wise
enough to realise that can happen any time PATH is changed, or any time
the shell notices that the mod time of any directory in PATH has changed,
or for almost any other reason, and so wrote it in words that were not
highly specific, to allow for all of these kinds of variations, and more.

Most of the rest of the issues in this e-mail are now resolved, I believe.

There are just two outcomes that I need to be clear about:

1) it is not wrong for a shell to continue to scan PATH when exec
fails (other than ENOEXEC) and exec some later file with the same
name which does succeed.   That's the original algorithm, it is
what the standard is trying to convey.

An extra point on this though is that I also have no real problem
with shells that decide to stop on the first file with 'x' permission
found in PATH, whether the exec succeeds or not.   The times when
that ever makes a difference in real world environments are so rare
to be irrelevant, and no-one should ever be depending on something
like that working (inserting a #!/bad "gcc" somewhere early on path,
and assuming it will be ignored).    As Mark said (the one thing he
got correct) is that it is conforming for a shell to treat that as
a shell script containing only a comment (the #! there makes it
unspecified, which means the shell can treat it that way) on systems
that don't support #! executables.

Just don't demand that shells act that way.

2) It is impossible (given current interfaces) for any command to ever
correctly predict what will be executed from a PATH search, and be
100% accurate.   Specifying "command -v" or "type" or anything else
similar in a way which pretends that is possible is a mistake.  The only
way to know for sure is to attempt to execute it, and if the command
happens to be "halt" and you happen to be root at the time, you probably
would not like the results if all you were attempting to do was locate
the path to the halt command.

I'm certainly not going to pretend that what is in the standard now is
perfect, or cannot be made better, but when we do that, we need to be
aware of what the objective really is, and "make 'command -v' perfect"
is not that.

Oh, one last item from your mail:

  | We were talking about dash internals.

You might have been, I certainly wasn't.  I know nothing about dash
internals (beyond from where it started, long long ago) or ksh internals
for that matter.   Just about how a generic implementation might
reasonably behave, which is what matters when it comes to the standard.

kre

Re: [Shell Command Language][shortcomings of command utlity][improving robustness of POSIX shells]

Reply via email to