Re: let head(1) understand `-' as stdin
On Oct 12 23:23:18, t...@math.ethz.ch wrote: > > Let me clarify the idea. > > If a filter recognizes '-' as a name for stdin, > > then stdin can be one of the _multiple_ files being processed. > > Filters that do not recognize '-' as a name, on the other hand, > > only process stdin if it is the _only_ input. > > I understand that - is convenient, but it's not strictly needed. > If you need the standard input as one of several files (or as an > explicit file argument), you can pass /dev/stdin. This is probably what I was missing. Thank you. Sorry for the noise.
Re: let head(1) understand `-' as stdin
> Let me clarify the idea. > If a filter recognizes '-' as a name for stdin, > then stdin can be one of the _multiple_ files being processed. > Filters that do not recognize '-' as a name, on the other hand, > only process stdin if it is the _only_ input. I understand that - is convenient, but it's not strictly needed. If you need the standard input as one of several files (or as an explicit file argument), you can pass /dev/stdin.
Re: let head(1) understand `-' as stdin
>> > > The diff below makes head(1) recognize `-' >> > > as a name for the standard input, >> > > as many other utilities do. > >On Oct 11 23:55:26, schwa...@usta.de wrote: >> > Do standards permit that extension? >> >> POSIX neither requires nor forbids it, but encourages consistency >> among all the utilities taking [file ...] arguments within a given >> operating system: >> >> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html > >> > This is command used in scripts. Scripts are often portable. If one >> > operating system has an extension, but others don't, then those >> > scripts become unportable to use use of these extensions. > >[Ingo's detailed analysis snipped] > >> > I'm not raising a new argument here, it's been raised numerous times >> > when it comes to commands commonly used in scripts. >> > >> > So consider that first. >> >> head(1) is firmly a BSD thingy, and all BSDs agree that "-" is a >> file name and not standard input. POSIX explicitly encourages >> treating it as standard input ***if you do that for other utilities, >> too***, and GNU coreutils has the only head(1) implementation i >> found so far that actually does it. >> >> The bigger picture seems to be that OpenBSD and illumos tend to resist >> treating "-" as standard input whereever resisting is allowed, >> while GNU embraces treating "-" as standard whereever allowed. >> Most other systems seem to be somewhat inconsistent, in particular >> in those cases where they imported GNU utilities. >> >> So much for the facts. >> >> >> I see two ways forward that make sense to me: >> >> a) Either remain conservative - in line with both BSD and SysV >> heritage - and not do it unless required by the standard. >> >> b) Or switch over to doing it whereever allowed - but then we >> should do it not just for head(1), but also for tail(1), >> grep(1), sed(1) and probably several others, and then we >> should probably also try to push such patches to FreeBSD, >> DragonFly, NetBSD, and illumos, or at least give them heads-ups. >> >> Changing only head(1) and leaving everything else as it is does not >> look like a complete plan to me. Even POSIX wouldn't encourage that. > >Thank you for the detailed analysis. > >If there is any interest in possibly going b) >see below for a look at the other text filters. > > >Let me clarify the idea. >If a filter recognizes '-' as a name for stdin, >then stdin can be one of the _multiple_ files being processed. >Filters that do not recognize '-' as a name, on the other hand, >only process stdin if it is the _only_ input. > >For example cat(1) and paste(1) do that, head(1) and tail(1) don't. >And there are other utilities that could do that, but don't. >Below is a list of text filters from bin/ and usr.bin/ >for which this seems relevant, separated into the two camps. > > Jan > > >These recognize '-' as a name for stdin >as one of possibly many inputs: > > cat > cmp > comm > cut > diff > file > join > lam > paste > pr > sdiff > sort > > >These process stdin only if it is >the only (unnamed) input: > > column > expand > fmt > fold > grep > head > hexdump > nl > rev > tail > ul > unexpand > unvis > vis > wc Bobby has pulled on Sally's hair, to everyone can pull on everyone's hair. Your list means nothing. Please read the standards fully, and understand the legal lawyer thing.
Re: let head(1) understand `-' as stdin
> > > The diff below makes head(1) recognize `-' > > > as a name for the standard input, > > > as many other utilities do. On Oct 11 23:55:26, schwa...@usta.de wrote: > > Do standards permit that extension? > > POSIX neither requires nor forbids it, but encourages consistency > among all the utilities taking [file ...] arguments within a given > operating system: > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html > > This is command used in scripts. Scripts are often portable. If one > > operating system has an extension, but others don't, then those > > scripts become unportable to use use of these extensions. [Ingo's detailed analysis snipped] > > I'm not raising a new argument here, it's been raised numerous times > > when it comes to commands commonly used in scripts. > > > > So consider that first. > > head(1) is firmly a BSD thingy, and all BSDs agree that "-" is a > file name and not standard input. POSIX explicitly encourages > treating it as standard input ***if you do that for other utilities, > too***, and GNU coreutils has the only head(1) implementation i > found so far that actually does it. > > The bigger picture seems to be that OpenBSD and illumos tend to resist > treating "-" as standard input whereever resisting is allowed, > while GNU embraces treating "-" as standard whereever allowed. > Most other systems seem to be somewhat inconsistent, in particular > in those cases where they imported GNU utilities. > > So much for the facts. > > > I see two ways forward that make sense to me: > > a) Either remain conservative - in line with both BSD and SysV > heritage - and not do it unless required by the standard. > > b) Or switch over to doing it whereever allowed - but then we > should do it not just for head(1), but also for tail(1), > grep(1), sed(1) and probably several others, and then we > should probably also try to push such patches to FreeBSD, > DragonFly, NetBSD, and illumos, or at least give them heads-ups. > > Changing only head(1) and leaving everything else as it is does not > look like a complete plan to me. Even POSIX wouldn't encourage that. Thank you for the detailed analysis. If there is any interest in possibly going b) see below for a look at the other text filters. Let me clarify the idea. If a filter recognizes '-' as a name for stdin, then stdin can be one of the _multiple_ files being processed. Filters that do not recognize '-' as a name, on the other hand, only process stdin if it is the _only_ input. For example cat(1) and paste(1) do that, head(1) and tail(1) don't. And there are other utilities that could do that, but don't. Below is a list of text filters from bin/ and usr.bin/ for which this seems relevant, separated into the two camps. Jan These recognize '-' as a name for stdin as one of possibly many inputs: cat cmp comm cut diff file join lam paste pr sdiff sort These process stdin only if it is the only (unnamed) input: column expand fmt fold grep head hexdump nl rev tail ul unexpand unvis vis wc
Re: let head(1) understand `-' as stdin
Ingo Schwarzewrites: > Hi, > > Theo de Raadt wrote on Tue, Oct 11, 2016 at 01:35:34PM -0600: >> jca@ wrote: >>> Jan Stary writes: > The diff below makes head(1) recognize `-' as a name for the standard input, as many other utilities do. > >>> Makes sense to me. The following points could be improved IMO: >>> - using strcmp sounds cleaner than those char comparisons >>> - I don't think the man page bits are needed. Utilities that read from >>> stdin are supposed to support `-'. I'm not sure whether the extra >>> example is really helpful. >>> - should we avoid closing stdin (multiple times)? Even though our >>> fclose(3) seems to cope with this, it seems that neither the >>> C standard nor POSIX offer such a guarantee. > >> Do standards permit that extension? > > POSIX neither requires nor forbids it, but encourages consistency > among all the utilities taking [file ...] arguments within a given > operating system: > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html > > NAME > head - copy the first part of files > OPTIONS > The head utility shall conform to XBD Utility Syntax Guidelines. > STDIN > The standard input shall be used if no file operands are > specified, and shall be used if a file operand is '-' and the > implementation treats the '-' as meaning standard input. > Otherwise, the standard input shall not be used. See the INPUT > FILES section. > > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html > > Guideline 13: > For utilities that use operands to represent files to be opened > for either reading or writing, the '-' operand should be used > to mean only standard input (or standard output when it is clear > from context that an output file is being specified) or a file > named -. > > Where a utility described in the Shell and Utilities volume of > POSIX.1-2008 as conforming to these guidelines is required to > accept, or not to accept, the operand '-' to mean standard input > or output, this usage is explained in the OPERANDS section. > Otherwise, if such a utility uses operands to represent files, > it is implementation-defined whether the operand '-' stands for > standard input (or standard output), or for a file named -. > > (Enjoy language lawyers' paradise.) > >> This is command used in scripts. Scripts are often portable. If one >> operating system has an extension, but others don't, then those >> scripts become unportable to use use of these extensions. > > * 1BSDfirst had head(1) (by Bill Joy 1977), >of course treats "-" as a filename > * AT System V UNIX didn't provide head(1) at all > * NetBSD treats "-" as a filename > * FreeBSD treats "-" as a filename > * DragonFly treats "-" as a filename > * illumos treats "-" as a filename > * Oracle Solaris 11 treats "-" as a filename > * GNU coreutils treats "-" as standard input > > Some related utilities: > >tail(1) grep(1) sed(1) >source: UNIX v7 UNIX v4 UNIX v7 (of course all filename) > > * 4.4BSD-L2 filename filename filename > * System Vfilename filename filename > * OpenBSD filename filename filename > * NetBSD filename stdin filename > * FreeBSD filename stdin filename > * DragonFly filename stdin filename > * illumos filename filename filename > * Solaris 11 stdin filename filename > * GNU stdin stdin stdin > > cat(1), sort(1): POSIX requires treating "-" as standard input > >> I'm not raising a new argument here, it's been raised numerous times >> when it comes to commands commonly used in scripts. >> >> So consider that first. > > head(1) is firmly a BSD thingy, and all BSDs agree that "-" is a > file name and not standard input. POSIX explicitly encourages > treating it as standard input ***if you do that for other utilities, > too***, and GNU coreutils has the only head(1) implementation i > found so far that actually does it. > > The bigger picture seems to be that OpenBSD and illumos tend to resist > treating "-" as standard input whereever resisting is allowed, > while GNU embraces treating "-" as standard whereever allowed. > Most other systems seem to be somewhat inconsistent, in particular > in those cases where they imported GNU utilities. > > So much for the facts. Thanks a lot for this thorough analysis. > > I see two ways forward that make sense to me: > > a) Either remain conservative - in line with both BSD and SysV > heritage - and not do it unless required by the standard. > > b) Or switch over to doing it whereever allowed - but then we > should do it not just for head(1), but also for tail(1), > grep(1), sed(1) and probably several others, and then we > should probably also try to push such patches to
Re: let head(1) understand `-' as stdin
Hi, Theo de Raadt wrote on Tue, Oct 11, 2016 at 01:35:34PM -0600: > jca@ wrote: >> Jan Starywrites: >>> The diff below makes head(1) recognize `-' >>> as a name for the standard input, >>> as many other utilities do. >> Makes sense to me. The following points could be improved IMO: >> - using strcmp sounds cleaner than those char comparisons >> - I don't think the man page bits are needed. Utilities that read from >> stdin are supposed to support `-'. I'm not sure whether the extra >> example is really helpful. >> - should we avoid closing stdin (multiple times)? Even though our >> fclose(3) seems to cope with this, it seems that neither the >> C standard nor POSIX offer such a guarantee. > Do standards permit that extension? POSIX neither requires nor forbids it, but encourages consistency among all the utilities taking [file ...] arguments within a given operating system: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html NAME head - copy the first part of files OPTIONS The head utility shall conform to XBD Utility Syntax Guidelines. STDIN The standard input shall be used if no file operands are specified, and shall be used if a file operand is '-' and the implementation treats the '-' as meaning standard input. Otherwise, the standard input shall not be used. See the INPUT FILES section. http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html Guideline 13: For utilities that use operands to represent files to be opened for either reading or writing, the '-' operand should be used to mean only standard input (or standard output when it is clear from context that an output file is being specified) or a file named -. Where a utility described in the Shell and Utilities volume of POSIX.1-2008 as conforming to these guidelines is required to accept, or not to accept, the operand '-' to mean standard input or output, this usage is explained in the OPERANDS section. Otherwise, if such a utility uses operands to represent files, it is implementation-defined whether the operand '-' stands for standard input (or standard output), or for a file named -. (Enjoy language lawyers' paradise.) > This is command used in scripts. Scripts are often portable. If one > operating system has an extension, but others don't, then those > scripts become unportable to use use of these extensions. * 1BSDfirst had head(1) (by Bill Joy 1977), of course treats "-" as a filename * AT System V UNIX didn't provide head(1) at all * NetBSD treats "-" as a filename * FreeBSD treats "-" as a filename * DragonFly treats "-" as a filename * illumos treats "-" as a filename * Oracle Solaris 11 treats "-" as a filename * GNU coreutils treats "-" as standard input Some related utilities: tail(1) grep(1) sed(1) source: UNIX v7 UNIX v4 UNIX v7 (of course all filename) * 4.4BSD-L2 filename filename filename * System Vfilename filename filename * OpenBSD filename filename filename * NetBSD filename stdin filename * FreeBSD filename stdin filename * DragonFly filename stdin filename * illumos filename filename filename * Solaris 11 stdin filename filename * GNU stdin stdin stdin cat(1), sort(1): POSIX requires treating "-" as standard input > I'm not raising a new argument here, it's been raised numerous times > when it comes to commands commonly used in scripts. > > So consider that first. head(1) is firmly a BSD thingy, and all BSDs agree that "-" is a file name and not standard input. POSIX explicitly encourages treating it as standard input ***if you do that for other utilities, too***, and GNU coreutils has the only head(1) implementation i found so far that actually does it. The bigger picture seems to be that OpenBSD and illumos tend to resist treating "-" as standard input whereever resisting is allowed, while GNU embraces treating "-" as standard whereever allowed. Most other systems seem to be somewhat inconsistent, in particular in those cases where they imported GNU utilities. So much for the facts. I see two ways forward that make sense to me: a) Either remain conservative - in line with both BSD and SysV heritage - and not do it unless required by the standard. b) Or switch over to doing it whereever allowed - but then we should do it not just for head(1), but also for tail(1), grep(1), sed(1) and probably several others, and then we should probably also try to push such patches to FreeBSD, DragonFly, NetBSD, and illumos, or at least give them heads-ups. Changing only head(1) and leaving everything else as it is does not look like a complete plan to me. Even POSIX wouldn't encourage that. Yours, Ingo
Re: let head(1) understand `-' as stdin
On Oct 11 13:35:34, dera...@openbsd.org wrote: > This is command used in scripts. Scripts are often portable. If one > operating system has an extension, but others don't, then those > scripts become unportable to use use of these extensions. GNU head(1) has it, Solaris does not. (I don't have access to others right now.)
Re: let head(1) understand `-' as stdin
On Oct 11 21:27:54, j...@wxcvbn.org wrote: > Jan Starywrites: > > > The diff below makes head(1) recognize `-' > > as a name for the standard input, > > as many other utilities do. > > Makes sense to me. The following points could be improved IMO: Updated diff below. > - using strcmp sounds cleaner than those char comparisons OK > - I don't think the man page bits are needed. Utilities that read from > stdin are supposed to support `-'. I'm not sure whether the extra > example is really helpful. I have removed the example. I think the one sentence about "-" should stay; other utils which recognize "-" mention it. > - should we avoid closing stdin (multiple times)? fixed OK? Jan Index: head.1 === RCS file: /cvs/src/usr.bin/head/head.1,v retrieving revision 1.23 diff -u -p -r1.23 head.1 --- head.1 25 Oct 2015 21:50:32 - 1.23 +++ head.1 11 Oct 2016 21:05:07 - @@ -47,6 +47,9 @@ utility copies the first lines of each specified .Ar file to the standard output. +A name of +.Sq - +is recognized as standard input. If no files are named, .Nm copies lines from the standard input. Index: head.c === RCS file: /cvs/src/usr.bin/head/head.c,v retrieving revision 1.21 diff -u -p -r1.21 head.c --- head.c 20 Mar 2016 17:14:51 - 1.21 +++ head.c 11 Oct 2016 21:05:07 - @@ -30,6 +30,7 @@ */ #include +#include #include #include #include @@ -93,7 +94,8 @@ main(int argc, char *argv[]) if (pledge("stdio", NULL) == -1) err(1, "pledge"); } else { - if ((fp = fopen(*argv, "r")) == NULL) { + fp = strcmp(*argv, "-") ? fopen(*argv, "r") : stdin; + if (fp == NULL) { warn("%s", *argv++); status = 1; continue; @@ -101,7 +103,8 @@ main(int argc, char *argv[]) if (argc > 1) { if (!firsttime) putchar('\n'); - printf("==> %s <==\n", *argv); + printf("==> %s <==\n", + fp == stdin ? "(stdin)" : *argv); } ++argv; } @@ -109,7 +112,8 @@ main(int argc, char *argv[]) while ((ch = getc(fp)) != EOF) if (putchar(ch) == '\n') break; - fclose(fp); + if (fp != stdin) + fclose(fp); } /*NOTREACHED*/ }
Re: let head(1) understand `-' as stdin
> On 2016/10/11 13:35, Theo de Raadt wrote: > > > Jan Starywrites: > > > > > > > The diff below makes head(1) recognize `-' > > > > as a name for the standard input, > > > > as many other utilities do. > > > > > > Makes sense to me. The following points could be improved IMO: > > > - using strcmp sounds cleaner than those char comparisons > > > - I don't think the man page bits are needed. Utilities that read from > > > stdin are supposed to support `-'. I'm not sure whether the extra > > > example is really helpful. > > > - should we avoid closing stdin (multiple times)? Even though our > > > fclose(3) seems to cope with this, it seems that neither the > > > C standard nor POSIX offer such a guarantee. > > > > Do standards permit that extension? > > > > This is command used in scripts. Scripts are often portable. If one > > operating system has an extension, but others don't, then those > > scripts become unportable to use use of these extensions. > > > > I'm not raising a new argument here, it's been raised numerous times > > when it comes to commands commonly used in scripts. > > > > So consider that first. > > Standards permit it but don't require it, so it's already a mess. > > "The standard input shall be used if no file operands are specified, and > shall be used if a file operand is '-' and the implementation treats the > '-' as meaning standard input. Otherwise, the standard input shall not > be used. See the INPUT FILES section." > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html > > Same for tail, of course. Well in that case, support for "-" is probably increasing compatibility rather than diminishing it.
Re: let head(1) understand `-' as stdin
On 2016/10/11 13:35, Theo de Raadt wrote: > > Jan Starywrites: > > > > > The diff below makes head(1) recognize `-' > > > as a name for the standard input, > > > as many other utilities do. > > > > Makes sense to me. The following points could be improved IMO: > > - using strcmp sounds cleaner than those char comparisons > > - I don't think the man page bits are needed. Utilities that read from > > stdin are supposed to support `-'. I'm not sure whether the extra > > example is really helpful. > > - should we avoid closing stdin (multiple times)? Even though our > > fclose(3) seems to cope with this, it seems that neither the > > C standard nor POSIX offer such a guarantee. > > Do standards permit that extension? > > This is command used in scripts. Scripts are often portable. If one > operating system has an extension, but others don't, then those > scripts become unportable to use use of these extensions. > > I'm not raising a new argument here, it's been raised numerous times > when it comes to commands commonly used in scripts. > > So consider that first. Standards permit it but don't require it, so it's already a mess. "The standard input shall be used if no file operands are specified, and shall be used if a file operand is '-' and the implementation treats the '-' as meaning standard input. Otherwise, the standard input shall not be used. See the INPUT FILES section." http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html Same for tail, of course.
Re: let head(1) understand `-' as stdin
> Jan Starywrites: > > > The diff below makes head(1) recognize `-' > > as a name for the standard input, > > as many other utilities do. > > Makes sense to me. The following points could be improved IMO: > - using strcmp sounds cleaner than those char comparisons > - I don't think the man page bits are needed. Utilities that read from > stdin are supposed to support `-'. I'm not sure whether the extra > example is really helpful. > - should we avoid closing stdin (multiple times)? Even though our > fclose(3) seems to cope with this, it seems that neither the > C standard nor POSIX offer such a guarantee. Do standards permit that extension? This is command used in scripts. Scripts are often portable. If one operating system has an extension, but others don't, then those scripts become unportable to use use of these extensions. I'm not raising a new argument here, it's been raised numerous times when it comes to commands commonly used in scripts. So consider that first.