On 2021/04/21 19:11, Paul Eggert wrote:
On 4/18/21 10:46 AM, Peter van Dijk wrote:
While the manual (but not the manpage) mentions the data loss, I think it would be great if sort did not have this problem at all, and I think the OpenGroup text also says it should not have this problem.
 I looked around, and a lot of software does get this right (by opening a 
randomly-named temp file to write to, and only moving it into place when it is 
written successfuly) - GNU sed -i, OpenBSD sort, and surely there are more. As 
a bonus, doing this would also make the `-o someinputfile -m` case safe.


I don't know of any 'sort' implementation that does not have the problem at all. ...
Nevertheless, it is the same problem as reported _1.5_months_
ago where no one had time to look at the same design flaw
in gnu-coreutils implementation of 'cp' (bug#47059).

That bug, still untriaged, had the same suggested solution:

  "When creating a link to a local file, I
   first create the link to a temporary name to ensure
   I have appropriate access (or that its not
   cross-linked in this case)."

At that time the bug was only reported against 'cp', but it
seems that not testing for final location writeability is
a gnu-bug stemming from mono-culture development where
outside ideas and bug reports tend to be ignored.

The previous, similar bug in 'cp' I reported was ignored for
1.5-2 YEARS, before a large enough corporation lost enough
data for GNU to pay attention.  Though in this case, did
the report against 'sort' get noticed because the reporter
wasn't female?  Perhaps others within GNU have inculcated
the biases of RMS and my feelings of tolerance were naive
(wouldn't be the first time).


That bug was left untriaged
that was left untriaged with bug#47059.  And it is the same
solution -- opening a randomly-named temp file to write to
and only performing final actions when writeability of
the destination is confirmed.

Also, I don't see where the Open Group spec says what you're saying. On the contrary, the spec merely says that '-o output' should cause output to be sent to the output file. If there are multiple hard links to the output file, this suggests 'sort' should update the output file's contents without breaking any hard links. Admittedly the Open Group spec is a bit vague in this area, but I certainly don't see anything implying that GNU 'sort' does not conform to POSIX in this area.

FreeBSD 'sort' has a problem, in that 'sort -o A B' preserves all hard links to A's file, but 'sort -o A A' does not because it breaks the link from A. That's confusing.

Traditional Unix 'sort -o A' behaves the way GNU 'sort' does; it preserves all hard links to A's file. So there is a compatibility argument for doing things the way GNU 'sort' does them, even if that might lead to more data loss in rare cases.






Reply via email to