The file position after an fclose on an input stream is left at the end of the buffered data and not where the caller thinks he left it. Normally, this will not matter, however, this behavior becomes important, as suggested by David Korn on the austin group mailing list, when we consider programs using a shared file descriptor as can be done when using the shell. Consider the following chunk of code, which logically, would print 4999 (the subshell being used to skip the first line):
neal@desdemona:~/src/textutils-2.0.20 (0)$ uname -a Linux desdemona 2.2.17 #4 SMP Sat Sep 16 21:51:08 EST 2000 i686 unknown neal@desdemona:~/src/textutils-2.0.20 (0)$ perl -e \ > 'print "test\n" x 5000' | (src/head -n1 >/dev/null; cat) | wc -l 4181 Instead of 4999, we get, instead, some random number. What has happened is that due to buffering in head cat starts reading long after our intended position. This problem is not restricted to glibc or the GNU text utilities. Similar behavior can be seen on Tru64 (using their implementation of head): nwalfiel@saturn:~$ uname -a OSF1 saturn.cs.uml.edu V5.0 1094 alpha nwalfiel@saturn:~$ perl -e 'print "test\n" x 5000' | \ > (head -n1 >/dev/null; cat) | wc -l 3362 And SunOS: nwalfiel@force1:~$ uname -a SunOS force1 4.1.3_U1 2 sun4c nwalfiel@force1:~$ perl -e 'print "test\n" x 5000' | \ > /usr/ucb/head -1 >/dev/null; cat) | wc -l 4181 The standards do not have too much say about this behavior. For instance, the third version of the Single Unix Specification in its description of fclose says nothing about how the file position is to be left: The fclose() function shall cause the stream pointed to by stream to be flushed and the associated file to be closed. Any unwritten buffered data for the stream shall be written to the file; any unread buffered data shall be discarded. Whether or not the call succeeds, the stream shall be disassociated from the file and any buffer set by the setbuf() or setvbuf() function shall be disassociated from the stream. If the associated buffer was automatically allocated, it shall be deallocated. And, according to the same standard, flushing an input steam (using fflush) is undefined: If stream points to an output stream or an update stream in which the most recent operation was not input, fflush() shall cause any unwritten data for that stream to be written to the file, [CX] [[Option Start]] and the st_ctime and st_mtime fields of the underlying file shall be marked for update. [[Option End]] However, in the rational section for fflush, this case is described: Data buffered by the system may make determining the validity of the position of the current file descriptor impractical. Thus, enforcing the repositioning of the file descriptor after fflush() on streams open for read() is not mandated by IEEE Std 1003.1-2001. This means that glibc does not have to do anything about this, however, after a glance at the libio code, it seems to me that it would be possible to reposition the file position in _IO_new_fclose. Yet even this change will not make head react correctly in all situations -- as already mentioned, at least SunOS and Tru64 leave the file position of an input stream at the end of the buffered data. Therefore to be completely robust, text utils would need to be changed. My impression is that a call to fsetpos would not help, however, using unbuffered input, e.g. calling setvbuf before starting to read from the stream, although slower, would. If this change is desirable, I would be happy to discuss it a bit more and then implement it. _______________________________________________ Bug-textutils mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-textutils