Hi Pádraig, Pádraig Brady <[email protected]> writes:
>> My 0001 patch attached fixes that test case. I haven't pushed it yet >> in >> case we decide that is not the desired behavior. > > Oops, sorry I pushed a fix for this already. No worries! >> I noticed another issue with Pádraig's patch that I mentioned privately, >> but I think is worth discussion publicly in case anyone wants to >> comment. I believe that it does violate POSIX requirements. The page for >> 'head' says [1]: >> STDOUT >> The standard output shall contain designated portions of >> the input >> files. >> If multiple file operands are specified, head shall >> precede the >> output for each with the header: >> "\n==> %s <==\n", <pathname> >> except that the first header written shall not include >> the initial >> <newline>. >> My interpretation of this is that "pathname" refers to an unaltered >> name >> from the command line. Conversion between "-" and "standard input" seems >> okay since it isn't really a path. >> Given that the headers are mostly a visual aid, I am okay with the >> new >> behavior. However, I propose following the existing behavior if >> POSIXLY_CORRECT is defined. I have attached patch 0002 which does this. > > I think POSIX just hasn't considered/specified this case. > I.e., it's just loosely stating <pathname> rather than > rigidly specifying a raw path name. > I'm not against the POSIXLY_CORRECT patch since it shouldn't cause problems, > but I don't think it buys anything either as one can't robustly parse > the output from tail(1) anyway due to the content possibly containing > data matching the file header pattern. > So I'd be 60:40 against adding this divergence. I still feel my interpretation is correct, but perhaps the POSIX people did not mean for it to be read that strictly. It would be good to get clarification, especially since security people are using LLMs to find any unescaped output and treat it as a "vulnerability" with little, if any, consideration as to why the current behavior is the way that it is... > I wouldn't even skip the close(), as perhaps some weird device > gives EBADF for some syscalls but not others. > For edge cases like this, I'd keep the code as simple as possible, > and just do what the user asked, unless it causes a particular problem. > > It's good to avoid repeated errors for _single files_ like we adjusted with: > https://github.com/coreutils/coreutils/commit/0b2ff7637 > But for file specified multiply like this, it's best to process each > independently. Yes, I think I agree. Collin
