On 03/30/2014 09:40 PM, Denis Excoffier wrote:
> Hello,
> 
> head -n -1 -- -
> or equivalently
> head -n -1
> returns immediately (ie does not wait for further stdin) and prints nothing.
> 
> I use coreutils 8.22 compiled (with gcc-4.8.2) on top of darwin 13.1.0 
> (Mavericks).
> 
> However the following seem to work perfectly:
> head -n 1
> head -c -1
> cat | head -n -1
> head -n -1 ---presume-input-pipe
> on cygwin: head -n -1
> 
> What is weird on my system is lseek() at the beginning of 
> elide_tail_lines_file():
> lseek(fd, 0, SEEK_CUR) returns a (random?) number, something like 6735, 539 
> etc.
> lseek(fd, 0, SEEK_END) returns 0

So:
  head -n -1 # returns immediately
while:
  cat | head -n -1 # waits as expected

It seems we might be using non portable code here. POSIX says:

  "The behavior of lseek() on devices which are incapable of seeking is 
implementation-defined.
  The value of the file offset associated with such a device is undefined."

and also:

  "The lseek() function shall fail [with ESPIPE if] the fildes argument is 
associated with a pipe, FIFO, or socket"

So tty devices would come outside of this POSIX scope.
Furthermore the FreeBSD lseek man pages states:

  "Some devices are incapable of seeking and POSIX does not specify which
   devices must support it.

   Linux specific restrictions: using lseek on a tty device returns
   ESPIPE. Other systems return the number of written characters, using
   SEEK_SET to set the counter. Some devices, e.g. /dev/null do not cause
   the error ESPIPE, but return a pointer which value is undefined."

Now head(1) isn't the only place we use this logic. In dd we have:

  offset = lseek (STDIN_FILENO, 0, SEEK_CUR);
  input_seekable = (0 <= offset);

I wonder should be be using something like:

  bool seekable (int fd)
  {
    return ! isatty (fd) && lseek (fd, 0, SEEK_CUR) >= 0;
  }

Though this only handles the tty case, and there
could be other devices for which this could be an issue.
So the general question is, is there a way we can robustly
determine if we have a seekable device or not?
Perhaps by using SEEK_SET in combination with SEEK_CUR,
but notice the BSD lseek man page above says that tty devices
support SEEK_SET also :/ Anyway...

Note the original head(1) code to detect seekable input was introduced with:
  http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=61ba51a6
and that was changed recently due to a coverity identified logic issue, to:
  http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=5fdb5082
However that now logically consistent code will return immediately in your case.

I also notice the related `head -c -1` check is more conservative
in that it only uses the more efficient lseek() code for regular files,
which would mean we don't operate as efficiently as we could on a disk device
for example. But that's much better than undefined operation of course.
If we were to do the same for lines then we would also introduce a change
in behavior with devices like /dev/zero. Currently on Linux, this will return 
immediately:
  head -n -1 /dev/zero
I.E. we currently treat such devices as empty, and return immediately with 
success status,
whereas treating as a stream of NULs, would result in memory exhaustion while 
buffering
waiting for a complete line. That is probably the more consistent operation at 
least.

So the attached uses this more conservative test for the --lines=-N case.

thanks,
Pádraig.
>From 386bdbd2e9b9af1813bc4ac205755c0ce17f5cda Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]>
Date: Mon, 31 Mar 2014 13:16:56 +0100
Subject: [PATCH] head: with --lines=-N process devices more consistently

* src/head.c (elide_tail_lines_file): Restrict lseek() logic
to regular files, as it's not possible to determine if a device
is really seekable or not.  On BSD for example, lseek() returns
>= 0 for tty devices, while on Linux 0 will be returned for /dev/zero
which would therefore be ignored without this change.
Fixes http://bugs.gnu.org/17145
Redported by Denis Excoffier
---
 NEWS       |    4 ++++
 src/head.c |    8 +++++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 582f060..b37a21a 100644
--- a/NEWS
+++ b/NEWS
@@ -24,6 +24,10 @@ GNU coreutils NEWS                                    -*- outline -*-
   now copies all input to stdout.  Previously nothing was output in this case.
   [bug introduced with the --lines=-N feature in coreutils-5.0.1]
 
+  head --lines=-N now handles devices more consistently, not ignoring data
+  from virtual devices like /dev/zero, or on BSD systems data from tty devices.
+  [bug introduced with the --lines=-N feature in coreutils-5.0.1]
+
   ln -sf now replaces symbolic links whose targets can't exist.  Previously
   it would display an error, requiring --no-dereference to avoid the issue.
   [bug introduced in coreutils-5.3.0]
diff --git a/src/head.c b/src/head.c
index b833af6..d3d8eca 100644
--- a/src/head.c
+++ b/src/head.c
@@ -414,9 +414,9 @@ elide_tail_bytes_pipe (const char *filename, int fd, uintmax_t n_elide_0)
 static bool
 elide_tail_bytes_file (const char *filename, int fd, uintmax_t n_elide)
 {
-  struct stat stats;
+  struct stat st;
 
-  if (presume_input_pipe || fstat (fd, &stats) || ! S_ISREG (stats.st_mode))
+  if (presume_input_pipe || fstat (fd, &st) || ! S_ISREG (st.st_mode))
     {
       return elide_tail_bytes_pipe (filename, fd, n_elide);
     }
@@ -735,7 +735,9 @@ elide_tail_lines_seekable (const char *pretty_filename, int fd,
 static bool
 elide_tail_lines_file (const char *filename, int fd, uintmax_t n_elide)
 {
-  if (!presume_input_pipe)
+  struct stat st;
+
+  if (! presume_input_pipe && fstat (fd, &st) == 0 && S_ISREG (st.st_mode))
     {
       /* Find the offset, OFF, of the Nth newline from the end,
          but not counting the last byte of the file.
-- 
1.7.7.6

Reply via email to