Re: let head(1) understand `-' as stdin

2016-10-12 Thread Jan Stary
On Oct 12 23:23:18, t...@math.ethz.ch wrote:
> > Let me clarify the idea. 
> > If a filter recognizes '-' as a name for stdin,
> > then stdin can be one of the _multiple_ files being processed.
> > Filters that do not recognize '-' as a name, on the other hand,
> > only process stdin if it is the _only_ input.
> 
> I understand that - is convenient, but it's not strictly needed.
> If you need the standard input as one of several files (or as an
> explicit file argument), you can pass /dev/stdin.

This is probably what I was missing.
Thank you. Sorry for the noise.



Re: let head(1) understand `-' as stdin

2016-10-12 Thread Theo Buehler
> Let me clarify the idea. 
> If a filter recognizes '-' as a name for stdin,
> then stdin can be one of the _multiple_ files being processed.
> Filters that do not recognize '-' as a name, on the other hand,
> only process stdin if it is the _only_ input.

I understand that - is convenient, but it's not strictly needed.
If you need the standard input as one of several files (or as an
explicit file argument), you can pass /dev/stdin.



Re: let head(1) understand `-' as stdin

2016-10-12 Thread Theo de Raadt
>> > > The diff below makes head(1) recognize `-'
>> > > as a name for the standard input,
>> > > as many other utilities do.
>
>On Oct 11 23:55:26, schwa...@usta.de wrote:
>> > Do standards permit that extension?
>> 
>> POSIX neither requires nor forbids it, but encourages consistency
>> among all the utilities taking [file ...] arguments within a given
>> operating system:
>> 
>>   http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html
>
>> > This is command used in scripts.  Scripts are often portable.  If one
>> > operating system has an extension, but others don't, then those
>> > scripts become unportable to use use of these extensions.
>
>[Ingo's detailed analysis snipped]
>
>> > I'm not raising a new argument here, it's been raised numerous times
>> > when it comes to commands commonly used in scripts.
>> > 
>> > So consider that first.
>> 
>> head(1) is firmly a BSD thingy, and all BSDs agree that "-" is a
>> file name and not standard input.  POSIX explicitly encourages
>> treating it as standard input ***if you do that for other utilities,
>> too***, and GNU coreutils has the only head(1) implementation i
>> found so far that actually does it.
>> 
>> The bigger picture seems to be that OpenBSD and illumos tend to resist
>> treating "-" as standard input whereever resisting is allowed,
>> while GNU embraces treating "-" as standard whereever allowed.
>> Most other systems seem to be somewhat inconsistent, in particular
>> in those cases where they imported GNU utilities.
>> 
>> So much for the facts.
>> 
>> 
>> I see two ways forward that make sense to me:
>> 
>>  a) Either remain conservative - in line with both BSD and SysV
>> heritage - and not do it unless required by the standard.
>> 
>>  b) Or switch over to doing it whereever allowed - but then we
>> should do it not just for head(1), but also for tail(1),
>> grep(1), sed(1) and probably several others, and then we
>> should probably also try to push such patches to FreeBSD,
>> DragonFly, NetBSD, and illumos, or at least give them heads-ups.
>> 
>> Changing only head(1) and leaving everything else as it is does not
>> look like a complete plan to me.  Even POSIX wouldn't encourage that.
>
>Thank you for the detailed analysis.
>
>If there is any interest in possibly going b)
>see below for a look at the other text filters.
>
>
>Let me clarify the idea. 
>If a filter recognizes '-' as a name for stdin,
>then stdin can be one of the _multiple_ files being processed.
>Filters that do not recognize '-' as a name, on the other hand,
>only process stdin if it is the _only_ input.
>
>For example cat(1) and paste(1) do that, head(1) and tail(1) don't.
>And there are other utilities that could do that, but don't.
>Below is a list of text filters from bin/ and usr.bin/
>for which this seems relevant, separated into the two camps.
>
>   Jan
>
>
>These recognize '-' as a name for stdin
>as one of possibly many inputs:
>
>   cat
>   cmp
>   comm
>   cut
>   diff
>   file
>   join
>   lam
>   paste
>   pr
>   sdiff
>   sort
>
>
>These process stdin only if it is
>the only (unnamed) input:
>
>   column  
>   expand
>   fmt
>   fold
>   grep
>   head
>   hexdump
>   nl
>   rev
>   tail
>   ul
>   unexpand
>   unvis
>   vis
>   wc

Bobby has pulled on Sally's hair, to everyone can pull on everyone's
hair.  Your list means nothing.  Please read the standards fully, and
understand the legal lawyer thing.



Re: let head(1) understand `-' as stdin

2016-10-12 Thread Jan Stary
> > > The diff below makes head(1) recognize `-'
> > > as a name for the standard input,
> > > as many other utilities do.

On Oct 11 23:55:26, schwa...@usta.de wrote:
> > Do standards permit that extension?
> 
> POSIX neither requires nor forbids it, but encourages consistency
> among all the utilities taking [file ...] arguments within a given
> operating system:
> 
>   http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html

> > This is command used in scripts.  Scripts are often portable.  If one
> > operating system has an extension, but others don't, then those
> > scripts become unportable to use use of these extensions.

[Ingo's detailed analysis snipped]

> > I'm not raising a new argument here, it's been raised numerous times
> > when it comes to commands commonly used in scripts.
> > 
> > So consider that first.
> 
> head(1) is firmly a BSD thingy, and all BSDs agree that "-" is a
> file name and not standard input.  POSIX explicitly encourages
> treating it as standard input ***if you do that for other utilities,
> too***, and GNU coreutils has the only head(1) implementation i
> found so far that actually does it.
> 
> The bigger picture seems to be that OpenBSD and illumos tend to resist
> treating "-" as standard input whereever resisting is allowed,
> while GNU embraces treating "-" as standard whereever allowed.
> Most other systems seem to be somewhat inconsistent, in particular
> in those cases where they imported GNU utilities.
> 
> So much for the facts.
> 
> 
> I see two ways forward that make sense to me:
> 
>  a) Either remain conservative - in line with both BSD and SysV
> heritage - and not do it unless required by the standard.
> 
>  b) Or switch over to doing it whereever allowed - but then we
> should do it not just for head(1), but also for tail(1),
> grep(1), sed(1) and probably several others, and then we
> should probably also try to push such patches to FreeBSD,
> DragonFly, NetBSD, and illumos, or at least give them heads-ups.
> 
> Changing only head(1) and leaving everything else as it is does not
> look like a complete plan to me.  Even POSIX wouldn't encourage that.

Thank you for the detailed analysis.

If there is any interest in possibly going b)
see below for a look at the other text filters.


Let me clarify the idea. 
If a filter recognizes '-' as a name for stdin,
then stdin can be one of the _multiple_ files being processed.
Filters that do not recognize '-' as a name, on the other hand,
only process stdin if it is the _only_ input.

For example cat(1) and paste(1) do that, head(1) and tail(1) don't.
And there are other utilities that could do that, but don't.
Below is a list of text filters from bin/ and usr.bin/
for which this seems relevant, separated into the two camps.

Jan


These recognize '-' as a name for stdin
as one of possibly many inputs:

cat
cmp
comm
cut
diff
file
join
lam
paste
pr
sdiff
sort


These process stdin only if it is
the only (unnamed) input:

column  
expand
fmt
fold
grep
head
hexdump
nl
rev
tail
ul
unexpand
unvis
vis
wc



Re: let head(1) understand `-' as stdin

2016-10-12 Thread Jeremie Courreges-Anglas
Ingo Schwarze  writes:

> Hi,
>
> Theo de Raadt wrote on Tue, Oct 11, 2016 at 01:35:34PM -0600:
>> jca@ wrote:
>>> Jan Stary  writes:
>
 The diff below makes head(1) recognize `-'
 as a name for the standard input,
 as many other utilities do.
>
>>> Makes sense to me.  The following points could be improved IMO:
>>> - using strcmp sounds cleaner than those char comparisons
>>> - I don't think the man page bits are needed.  Utilities that read from
>>>   stdin are supposed to support `-'.  I'm not sure whether the extra
>>>   example is really helpful.
>>> - should we avoid closing stdin (multiple times)?  Even though our
>>>   fclose(3) seems to cope with this, it seems that neither the
>>>   C standard nor POSIX offer such a guarantee.
>
>> Do standards permit that extension?
>
> POSIX neither requires nor forbids it, but encourages consistency
> among all the utilities taking [file ...] arguments within a given
> operating system:
>
>   http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html
>
>   NAME
> head - copy the first part of files
>   OPTIONS
> The head utility shall conform to XBD Utility Syntax Guidelines.
>   STDIN
> The standard input shall be used if no file operands are
> specified, and shall be used if a file operand is '-' and the
> implementation treats the '-' as meaning standard input.
> Otherwise, the standard input shall not be used. See the INPUT
> FILES section.
>
>   http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html
>
>   Guideline 13:
> For utilities that use operands to represent files to be opened
> for either reading or writing, the '-' operand should be used
> to mean only standard input (or standard output when it is clear
> from context that an output file is being specified) or a file
> named -.
>
>   Where a utility described in the Shell and Utilities volume of
>   POSIX.1-2008 as conforming to these guidelines is required to
>   accept, or not to accept, the operand '-' to mean standard input
>   or output, this usage is explained in the OPERANDS section.
>   Otherwise, if such a utility uses operands to represent files,
>   it is implementation-defined whether the operand '-' stands for
>   standard input (or standard output), or for a file named -.
>
> (Enjoy language lawyers' paradise.)
>
>> This is command used in scripts.  Scripts are often portable.  If one
>> operating system has an extension, but others don't, then those
>> scripts become unportable to use use of these extensions.
>
>  * 1BSDfirst had head(1) (by Bill Joy 1977),
>of course treats "-" as a filename
>  * AT System V UNIX  didn't provide head(1) at all
>  * NetBSD  treats "-" as a filename
>  * FreeBSD treats "-" as a filename
>  * DragonFly   treats "-" as a filename
>  * illumos treats "-" as a filename
>  * Oracle Solaris 11   treats "-" as a filename
>  * GNU coreutils   treats "-" as standard input
>
> Some related utilities:
>
>tail(1)   grep(1)   sed(1)
>source: UNIX v7   UNIX v4   UNIX v7  (of course all filename)
>
>  * 4.4BSD-L2   filename  filename  filename
>  * System Vfilename  filename  filename
>  * OpenBSD filename  filename  filename
>  * NetBSD  filename  stdin filename
>  * FreeBSD filename  stdin filename
>  * DragonFly   filename  stdin filename
>  * illumos filename  filename  filename
>  * Solaris 11  stdin filename  filename
>  * GNU stdin stdin stdin
>
> cat(1), sort(1): POSIX requires treating "-" as standard input
>
>> I'm not raising a new argument here, it's been raised numerous times
>> when it comes to commands commonly used in scripts.
>> 
>> So consider that first.
>
> head(1) is firmly a BSD thingy, and all BSDs agree that "-" is a
> file name and not standard input.  POSIX explicitly encourages
> treating it as standard input ***if you do that for other utilities,
> too***, and GNU coreutils has the only head(1) implementation i
> found so far that actually does it.
>
> The bigger picture seems to be that OpenBSD and illumos tend to resist
> treating "-" as standard input whereever resisting is allowed,
> while GNU embraces treating "-" as standard whereever allowed.
> Most other systems seem to be somewhat inconsistent, in particular
> in those cases where they imported GNU utilities.
>
> So much for the facts.

Thanks a lot for this thorough analysis.

>
> I see two ways forward that make sense to me:
>
>  a) Either remain conservative - in line with both BSD and SysV
> heritage - and not do it unless required by the standard.
>
>  b) Or switch over to doing it whereever allowed - but then we
> should do it not just for head(1), but also for tail(1),
> grep(1), sed(1) and probably several others, and then we
> should probably also try to push such patches to 

Re: let head(1) understand `-' as stdin

2016-10-11 Thread Ingo Schwarze
Hi,

Theo de Raadt wrote on Tue, Oct 11, 2016 at 01:35:34PM -0600:
> jca@ wrote:
>> Jan Stary  writes:

>>> The diff below makes head(1) recognize `-'
>>> as a name for the standard input,
>>> as many other utilities do.

>> Makes sense to me.  The following points could be improved IMO:
>> - using strcmp sounds cleaner than those char comparisons
>> - I don't think the man page bits are needed.  Utilities that read from
>>   stdin are supposed to support `-'.  I'm not sure whether the extra
>>   example is really helpful.
>> - should we avoid closing stdin (multiple times)?  Even though our
>>   fclose(3) seems to cope with this, it seems that neither the
>>   C standard nor POSIX offer such a guarantee.

> Do standards permit that extension?

POSIX neither requires nor forbids it, but encourages consistency
among all the utilities taking [file ...] arguments within a given
operating system:

  http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html

  NAME
head - copy the first part of files
  OPTIONS
The head utility shall conform to XBD Utility Syntax Guidelines.
  STDIN
The standard input shall be used if no file operands are
specified, and shall be used if a file operand is '-' and the
implementation treats the '-' as meaning standard input.
Otherwise, the standard input shall not be used. See the INPUT
FILES section.

  http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html

  Guideline 13:
For utilities that use operands to represent files to be opened
for either reading or writing, the '-' operand should be used
to mean only standard input (or standard output when it is clear
from context that an output file is being specified) or a file
named -.

  Where a utility described in the Shell and Utilities volume of
  POSIX.1-2008 as conforming to these guidelines is required to
  accept, or not to accept, the operand '-' to mean standard input
  or output, this usage is explained in the OPERANDS section.
  Otherwise, if such a utility uses operands to represent files,
  it is implementation-defined whether the operand '-' stands for
  standard input (or standard output), or for a file named -.

(Enjoy language lawyers' paradise.)

> This is command used in scripts.  Scripts are often portable.  If one
> operating system has an extension, but others don't, then those
> scripts become unportable to use use of these extensions.

 * 1BSDfirst had head(1) (by Bill Joy 1977),
   of course treats "-" as a filename
 * AT System V UNIX  didn't provide head(1) at all
 * NetBSD  treats "-" as a filename
 * FreeBSD treats "-" as a filename
 * DragonFly   treats "-" as a filename
 * illumos treats "-" as a filename
 * Oracle Solaris 11   treats "-" as a filename
 * GNU coreutils   treats "-" as standard input

Some related utilities:

   tail(1)   grep(1)   sed(1)
   source: UNIX v7   UNIX v4   UNIX v7  (of course all filename)

 * 4.4BSD-L2   filename  filename  filename
 * System Vfilename  filename  filename
 * OpenBSD filename  filename  filename
 * NetBSD  filename  stdin filename
 * FreeBSD filename  stdin filename
 * DragonFly   filename  stdin filename
 * illumos filename  filename  filename
 * Solaris 11  stdin filename  filename
 * GNU stdin stdin stdin

cat(1), sort(1): POSIX requires treating "-" as standard input

> I'm not raising a new argument here, it's been raised numerous times
> when it comes to commands commonly used in scripts.
> 
> So consider that first.

head(1) is firmly a BSD thingy, and all BSDs agree that "-" is a
file name and not standard input.  POSIX explicitly encourages
treating it as standard input ***if you do that for other utilities,
too***, and GNU coreutils has the only head(1) implementation i
found so far that actually does it.

The bigger picture seems to be that OpenBSD and illumos tend to resist
treating "-" as standard input whereever resisting is allowed,
while GNU embraces treating "-" as standard whereever allowed.
Most other systems seem to be somewhat inconsistent, in particular
in those cases where they imported GNU utilities.

So much for the facts.


I see two ways forward that make sense to me:

 a) Either remain conservative - in line with both BSD and SysV
heritage - and not do it unless required by the standard.

 b) Or switch over to doing it whereever allowed - but then we
should do it not just for head(1), but also for tail(1),
grep(1), sed(1) and probably several others, and then we
should probably also try to push such patches to FreeBSD,
DragonFly, NetBSD, and illumos, or at least give them heads-ups.

Changing only head(1) and leaving everything else as it is does not
look like a complete plan to me.  Even POSIX wouldn't encourage that.

Yours,
  Ingo



Re: let head(1) understand `-' as stdin

2016-10-11 Thread Jan Stary
On Oct 11 13:35:34, dera...@openbsd.org wrote:
> This is command used in scripts.  Scripts are often portable.  If one
> operating system has an extension, but others don't, then those
> scripts become unportable to use use of these extensions.

GNU head(1) has it, Solaris does not.
(I don't have access to others right now.)



Re: let head(1) understand `-' as stdin

2016-10-11 Thread Jan Stary
On Oct 11 21:27:54, j...@wxcvbn.org wrote:
> Jan Stary  writes:
> 
> > The diff below makes head(1) recognize `-'
> > as a name for the standard input,
> > as many other utilities do.
> 
> Makes sense to me.  The following points could be improved IMO:

Updated diff below.

> - using strcmp sounds cleaner than those char comparisons

OK

> - I don't think the man page bits are needed.  Utilities that read from
>   stdin are supposed to support `-'.  I'm not sure whether the extra
>   example is really helpful.

I have removed the example.
I think the one sentence about "-" should stay;
other utils which recognize "-" mention it.

> - should we avoid closing stdin (multiple times)?

fixed


OK?


Jan


Index: head.1
===
RCS file: /cvs/src/usr.bin/head/head.1,v
retrieving revision 1.23
diff -u -p -r1.23 head.1
--- head.1  25 Oct 2015 21:50:32 -  1.23
+++ head.1  11 Oct 2016 21:05:07 -
@@ -47,6 +47,9 @@ utility copies the first
 lines of each specified
 .Ar file
 to the standard output.
+A name of
+.Sq -
+is recognized as standard input.
 If no files are named,
 .Nm
 copies lines from the standard input.
Index: head.c
===
RCS file: /cvs/src/usr.bin/head/head.c,v
retrieving revision 1.21
diff -u -p -r1.21 head.c
--- head.c  20 Mar 2016 17:14:51 -  1.21
+++ head.c  11 Oct 2016 21:05:07 -
@@ -30,6 +30,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -93,7 +94,8 @@ main(int argc, char *argv[])
if (pledge("stdio", NULL) == -1)
err(1, "pledge");
} else {
-   if ((fp = fopen(*argv, "r")) == NULL) {
+   fp = strcmp(*argv, "-") ? fopen(*argv, "r") : stdin;
+   if (fp == NULL) {
warn("%s", *argv++);
status = 1;
continue;
@@ -101,7 +103,8 @@ main(int argc, char *argv[])
if (argc > 1) {
if (!firsttime)
putchar('\n');
-   printf("==> %s <==\n", *argv);
+   printf("==> %s <==\n",
+   fp == stdin ? "(stdin)" : *argv);
}
++argv;
}
@@ -109,7 +112,8 @@ main(int argc, char *argv[])
while ((ch = getc(fp)) != EOF)
if (putchar(ch) == '\n')
break;
-   fclose(fp);
+   if (fp != stdin)
+   fclose(fp);
}
/*NOTREACHED*/
 }



Re: let head(1) understand `-' as stdin

2016-10-11 Thread Theo de Raadt
> On 2016/10/11 13:35, Theo de Raadt wrote:
> > > Jan Stary  writes:
> > > 
> > > > The diff below makes head(1) recognize `-'
> > > > as a name for the standard input,
> > > > as many other utilities do.
> > > 
> > > Makes sense to me.  The following points could be improved IMO:
> > > - using strcmp sounds cleaner than those char comparisons
> > > - I don't think the man page bits are needed.  Utilities that read from
> > >   stdin are supposed to support `-'.  I'm not sure whether the extra
> > >   example is really helpful.
> > > - should we avoid closing stdin (multiple times)?  Even though our
> > >   fclose(3) seems to cope with this, it seems that neither the
> > >   C standard nor POSIX offer such a guarantee.
> > 
> > Do standards permit that extension?
> > 
> > This is command used in scripts.  Scripts are often portable.  If one
> > operating system has an extension, but others don't, then those
> > scripts become unportable to use use of these extensions.
> > 
> > I'm not raising a new argument here, it's been raised numerous times
> > when it comes to commands commonly used in scripts.
> > 
> > So consider that first.
> 
> Standards permit it but don't require it, so it's already a mess.
> 
> "The standard input shall be used if no file operands are specified, and
> shall be used if a file operand is '-' and the implementation treats the
> '-' as meaning standard input. Otherwise, the standard input shall not
> be used. See the INPUT FILES section."
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html
> 
> Same for tail, of course.

Well in that case, support for "-" is probably increasing compatibility
rather than diminishing it.



Re: let head(1) understand `-' as stdin

2016-10-11 Thread Stuart Henderson
On 2016/10/11 13:35, Theo de Raadt wrote:
> > Jan Stary  writes:
> > 
> > > The diff below makes head(1) recognize `-'
> > > as a name for the standard input,
> > > as many other utilities do.
> > 
> > Makes sense to me.  The following points could be improved IMO:
> > - using strcmp sounds cleaner than those char comparisons
> > - I don't think the man page bits are needed.  Utilities that read from
> >   stdin are supposed to support `-'.  I'm not sure whether the extra
> >   example is really helpful.
> > - should we avoid closing stdin (multiple times)?  Even though our
> >   fclose(3) seems to cope with this, it seems that neither the
> >   C standard nor POSIX offer such a guarantee.
> 
> Do standards permit that extension?
> 
> This is command used in scripts.  Scripts are often portable.  If one
> operating system has an extension, but others don't, then those
> scripts become unportable to use use of these extensions.
> 
> I'm not raising a new argument here, it's been raised numerous times
> when it comes to commands commonly used in scripts.
> 
> So consider that first.

Standards permit it but don't require it, so it's already a mess.

"The standard input shall be used if no file operands are specified, and
shall be used if a file operand is '-' and the implementation treats the
'-' as meaning standard input. Otherwise, the standard input shall not
be used. See the INPUT FILES section."

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html

Same for tail, of course.



Re: let head(1) understand `-' as stdin

2016-10-11 Thread Theo de Raadt
> Jan Stary  writes:
> 
> > The diff below makes head(1) recognize `-'
> > as a name for the standard input,
> > as many other utilities do.
> 
> Makes sense to me.  The following points could be improved IMO:
> - using strcmp sounds cleaner than those char comparisons
> - I don't think the man page bits are needed.  Utilities that read from
>   stdin are supposed to support `-'.  I'm not sure whether the extra
>   example is really helpful.
> - should we avoid closing stdin (multiple times)?  Even though our
>   fclose(3) seems to cope with this, it seems that neither the
>   C standard nor POSIX offer such a guarantee.

Do standards permit that extension?

This is command used in scripts.  Scripts are often portable.  If one
operating system has an extension, but others don't, then those
scripts become unportable to use use of these extensions.

I'm not raising a new argument here, it's been raised numerous times
when it comes to commands commonly used in scripts.

So consider that first.