Date: Tue, 13 Apr 2021 10:16:26 +0100 From: Harald van Dijk <a...@gigawatt.nl> Message-ID: <7ab68758-b423-ae1b-4451-cd02c4b6b...@gigawatt.nl>
I think we have probably largely converged about this issue (with Chet's assistance) so this will probably be my last message about it. | ...your hypothetical example is not one of them, IMO. There is nothing | reasonable about saying that an activity that continues throughout the | execution of the script counts as start-up activity. Of course it could be, a start-up activity is an activity that commences at start-up time, how long it takes to complete is not specified. If the text intended to say "might be added at shell initialisation" it could have said that. What matters here is that we have a hash table, that is effectively a cache, which means things come, and go, more or less unpredictably (not so if you look at the details of the implementation, but to an outside observer). Eg: one possible thing would be for an implementation scanning PATH, to open and read each directory in turn, stopping as soon as it finds the command sought (appropriate perms etc) - and I mean, immediately (not reading the rest of the directory currently being scanned) adding all ('x' permission allowed) regular files located during that scan to the cache / hash-table. That is, what gets added can depend upon the order of files located in that final directory, which all depends upon how the directory was created and how the implementation manages them. Do remember that the standard is not legislation, it is an attempt to specify what the implementations actually do, so readers know what they can rely upon. In this case, how the has table works is largely unknowable from outside the implementation, so the standard is very wishy-washy about how things get added, and what can be expected to be there or not be there, while simultaneously alerting users to its likely presence, and the need to deal with its ramifications in the odd case that something may change which would invalidate cached data without the shell being aware that has happened. There is nothing better that is really possible - "the result of an unspecified start-up activity" is just a way of saying "things might happen that cannot be explained by anything else in here, deal with it". | Your interpretation | would mean *all* activity can be considered start-up activity just by | virtue of being performed on a different thread where that different | thread was launched at shell startup, Yes. But the only thing that gets to vary here is what is in or not in the cache, nothing else depends upon this. | and then of course as that is just | an implementation detail invisible to a user of the shell, it doesn't | actually have to be performed on a different thread. Of course. | If that were the intent, why would the standard say "shell start-up | activity" in the first place? I wasn't there when it was written, but I'd guess they were mostly thinking of pre-seeding of the hash table from PATH ... but were wise enough to realise that can happen any time PATH is changed, or any time the shell notices that the mod time of any directory in PATH has changed, or for almost any other reason, and so wrote it in words that were not highly specific, to allow for all of these kinds of variations, and more. Most of the rest of the issues in this e-mail are now resolved, I believe. There are just two outcomes that I need to be clear about: 1) it is not wrong for a shell to continue to scan PATH when exec fails (other than ENOEXEC) and exec some later file with the same name which does succeed. That's the original algorithm, it is what the standard is trying to convey. An extra point on this though is that I also have no real problem with shells that decide to stop on the first file with 'x' permission found in PATH, whether the exec succeeds or not. The times when that ever makes a difference in real world environments are so rare to be irrelevant, and no-one should ever be depending on something like that working (inserting a #!/bad "gcc" somewhere early on path, and assuming it will be ignored). As Mark said (the one thing he got correct) is that it is conforming for a shell to treat that as a shell script containing only a comment (the #! there makes it unspecified, which means the shell can treat it that way) on systems that don't support #! executables. Just don't demand that shells act that way. 2) It is impossible (given current interfaces) for any command to ever correctly predict what will be executed from a PATH search, and be 100% accurate. Specifying "command -v" or "type" or anything else similar in a way which pretends that is possible is a mistake. The only way to know for sure is to attempt to execute it, and if the command happens to be "halt" and you happen to be root at the time, you probably would not like the results if all you were attempting to do was locate the path to the halt command. I'm certainly not going to pretend that what is in the standard now is perfect, or cannot be made better, but when we do that, we need to be aware of what the objective really is, and "make 'command -v' perfect" is not that. Oh, one last item from your mail: | We were talking about dash internals. You might have been, I certainly wasn't. I know nothing about dash internals (beyond from where it started, long long ago) or ksh internals for that matter. Just about how a generic implementation might reasonably behave, which is what matters when it comes to the standard. kre