Date: Thu, 15 Jul 2021 10:19:17 +0100 From: "Geoff Clare via austin-group-l at The Open Group" <austin-group-l@opengroup.org> Message-ID: <20210715091917.GA13523@localhost>
Sorry, I've had other (more useful) things to do than deal with this... | You are looking at the wrong EXIT STATUS wording. It is the wording | for exit status 0 that is different. | | cd: | | 0 The directory was successfully changed. | | pwd: | | 0 Successful completion. | | See the difference now? Sure, there's a difference, but it all either means nothing, or leans towards my interpretation of the standard. For pwd "0 Successful completion" has never been in doubt, the question has always been "what is required of pwd for it to be considered successful?" That's where we started. You contend that it must successfully write to standard output, because, well, just because nothing else seems sane, if I understand what you've been saying all this time. I say that pwd must write to standard output, but that nothing says that that write must actually succeed - because there simply is no text that requires that (if there were, someone would have quoted it by now). You argue that that makes no sense, as if no output appears, pwd hasn't done anything useful - I say that that might be true, but that's how it always has been, and the standard is supposed to document what the industry practice is, not what you (or even I) might want it to be. You believe that this was intentionally changed, but have provided no evidence at all to support that contention (and nothing to justify doing such a thing, if it had been done). This all gets even more obvious when we consider the exit status for cd... 82503 0 The current working directory was successfully changed and the value of the PWD 82504 environment variable was set correctly. 82508 >0 Either the -e option or the -P option is not in effect, and an error occurred. (Let's just forget the (new) -e option for now, nothing material changes in that case.) Here the requirement is that the exit status is 0 if "The current working directory was successfully changed" (and PWD was set, but that's not an issue here), and ">0 if an error occurred". Since the exit status cannot be both 0 and >0, by definition (there is only one exit status), if the directory was successfully changed (and PWD updated) then it is impossible for an error to have occurred. That is, even if: 82490 STDOUT 82491 If a non-empty directory name from CDPATH is used, or if the operand '-' is used, an absolute 82492 pathname of the new working directory shall be written to the standard output as follows: fails. That is, a write error writing to standard output cannot be treated as an error that occurred, otherwise cd would be required to exit both 0 and >0. Again, this conforms with the (ancient) industry practice of ignoring write errors on standard output (whether you, or anyone else, believes that is a good thing, or reckless foolishness). If it is not an error for cd, then unless there is text somewhere to the contrary, it is not an error that occurred for pwd either (the wording in the STDOUT sections of the two commands is essentially the same). No such text has been quoted, so I assume no such text exists. | The text I have already referred to is perfectly sufficient. No, it is not. | I am drawing | a conclusion from it that should be completely obvious to anyone with | even a rudimentary knowledge of computer terminology. You're drawing a conclusion which seems like the only sane thing - but the standard is not required to be sane if the implementations are not. If the implementations don't check for these write errors, then it is wrong of the standard to pretend that they do, and totally unjustifiable to attempt to legislate to make that happen (that would put the standard group into the position of being some kind of monopoly cartel). | > actually says "shall successfully write to" ... It doesn't. | | The word "successful" is in the description of exit status 0. It is, but they are different things being successful, one is the write (which isn't so required) the other is completion of the command, which can be successful (as insane as it looks) when the write has failed. | > Really? Given all this unexplained data loss, there must be a whole | > raft of bug reports, and/or fixes, over the years, I assume that you | > have evidence of that, or are you just guessing? | | Are you genuinely asking me for *evidence* that some particular thing | has caused an event to go unnoticed or unexplained? I am. Otherwise what you're doing is just spreading FUD. What if I were to assert that there must be many unexplained deaths in the UK each year from undetected cases of yellow fever? No-one tests for that any more, as no-one believes it ever occurs, hence, there simply must be many undetected cases (by your logic). It certainly could be happening. Nonsense. One cannot claim that something must be happening simply because one can show that it might happen, unless one has some evidence that it really does happen (in real world cases). If you have the evidence, from examining a sampling of cases where unexplained things happened, where you can produce enough evidence to show that an undetected write error to standard output caused a loss that wasn't otherwise noticed, then perhaps we can believe that there are other similar cases - but you have to show that it really happens (in real world cases, not imagined or test scenarios) first. | I didn't claim these events aren't rare - I said they had undoubtedly | happened many times. That's across dozens of utilities used by millions | of users over almost three decades. And if they had, someone would have noticed, at least a few times, and reported it. Otherwise all you have us supposition, guess work, FUD. | It's easy to see how data loss caused by an ENOSPC error could go | unnoticed or unexplained if not diagnosed by a utility. Here's just | one plausible scenario... Actually not so plausible. | You kick off a "find ... -exec grep -l ... {} +" command and go and | make a coffee. I wouldn't, but that's beside the point (I'm not addicted to caffeine, or other drugs...) | Another user has by mistake executed a runaway command that is | filling the disk. That may have been plausible once, but rarely is any more - not because runaway commands no longer fill disks, though that's getting harder and harder to achieve with the amounts of storage around these days, but because this "another user" is very unlikely to exist any more - computers have become so cheap (along with attached storage) that almost no-one runs real commands like this in shared systems any more (long term or project storage may be shared, but local computing, and working, happens on local systems). But again, we can ignore that, as it still is possible. | When it fails with an "out of space" error they | realise their mistake and remove the huge file they created. And that could happen, though amy reasonable system is going to leave evidence in the system error log that a filesystem was full, so it is easy to see that it happened. | During the short time the disk was full, one of your grep commands | failed to write some output but did not report it as an error. That might happen. | When you return from your coffee break your find command has finished | and there is no indication that anything went wrong. But not that. Sure, one of the grep commands might have "failed to write some output", but the commands in something like that, and the blocks in the filesystem aren't synchronised. What you're almost certainly going to get is partial output from that grep that failed, up to the end of the block that was part full before the filesystem was filled up, but not the rest of it - when the next grep (after the filesystem has had space returned) runs and produces output, its output will be shoved in what appears to be the middle of the previous output. The resulting output file will appear corrupted, it won't simply be missing a line. It might be different if the filesystem was full before the command started, so the very first output couldn't be written (that would be at the start of a block) - but then the user running this is very likely to notice that the filesystem is full, because they are active at the time (you can no longer imagine all of this happening when they're not paying attention). But regardless of all of this, you're still completely missing the point. I am not arguing that it is a good thing that commands don't detect write errors on standard output (except perhaps in cases like cd, and rm -v, and another one or two I will mention below, and similar cases, where the output is ancillary to the actual work being done) - what I am arguing that the standard, as it is written now, does not require such checks, and that that is arguably the correct thing for the standard to say. | Your users may well be among those who have suffered data loss because | of this bug, but they don't know that this bug was the cause, so they | can't report it. Users very often don't know the actual cause of bugs, but they report them anyway. When reported, they're investigated, someone works out what might have happened, and if appropriate fixes things. No reports at all means nothing went wrong (some users might simply say "huh?" and try again, others report every little thing that doesn't work exactly as they think it should, right or wrong.) And to conclude, as unless something that's actually new appears here, as in most probably, text that is currently in the standard that says something different that I am assuming (and which no-one has been able to find up to now ... but as I said before, it is a BIG standard, and sometimes it takes time to happen across the magic sentence), then I am unlikely to continue this, as I believe it to be clear that the standard does not require this. As I was saying, to conclude, consider two more commands which are required to write to standard output, and where exiting 0, even in the face of write errors is probably the right thing to do. First, make: 98356 STDOUT 98357 The make utility shall write all commands to be executed to standard output unless the -s option 98358 was specified, the command is prefixed with an at-sign, or the special target .SILENT has either Forget the "unless" stuff, we will just consider the cases where none of that is true. 98921 When the -q option is not specified, the make utility shall exit with one of the following values: 98922 0 Successful completion. 98923 >0 An error occurred. Again, we can just ignore -q (not use it) but not doing so just changes >0 to >1 (so is irrelevant). In general, people want make to exit 0 if it successfully built the target (or the target was already up to date) - while that is happening, make typically writes lots to stdout, and those writes (or some of them) might fail. We still typically want exit(0) to mean "target exists and is up to date" and no more than that. Second, consider jobs, this one is a command whose primary purpose is to write to standard output (unlike make), so would seem to be a case where exit(0) should (if anywhere) mean that those writes worked. But: 94110 DESCRIPTION 94111 The jobs utility shall display the status of jobs that were started in the current shell environment; 94112 see Section 2.12 (on page 2348). 94151 STDOUT 94152 If the -p option is specified, the output shall consist of one line for each process ID: 94153 "%d\n", <process ID> 94154 Otherwise, if the -l option is not specified, the output shall be a series of lines of the form: 94155 "[%d] %c %s %s\n", <job-number>, <current>, <state>, <command> 94197 EXIT STATUS 94198 The following exit values shall be returned: 94199 0 Successful completion. 94200 >0 An error occurred. All looks straightforward, right, except for this bit... 94113 When jobs reports the termination status of a job, the shell shall remove its process ID from the 94114 list of those ``known in the current shell execution environment''; see Section 2.9.3.1 (on page 94115 2336). When that is considered, it gets much harder to work out how to deal with write errors to standard output. If "reports" (which means writes to standard output) there means "successfully writes" then we have a problem, as the shell (where jobs is built in, otherwise it has nothing to report and this whole issue is moot) will write several lines to its output buffer before doing a "write" system, call, and when that fails, it won't know what has been reported, and what has not, so it will have no idea which jobs should be removed from the process table. On the other hand is "reports" just means "attempts to write" then there is no issue, we ignore the write error, in the unlikely case it happens, and simply remove any job whose output we attempted to wrote. Which I believe is what shells actually do. There's lots more like this - it really is no surprise that no-one has ever wanted to deal with this can of worms, by actually requiring what you believe should be required. kre