On Thu, Oct 09, 2025 at 08:54:56PM -0700, Collin Funk wrote: > On all versions of glibc, getline and getdelim do not NUL-terminate the > line buffer if the first read character is EOF as required by POSIX > [1]. The easiest way to demonstrate this is with this test program and > an empty file: >
> +++ b/doc/posix-functions/getdelim.texi > @@ -19,6 +19,11 @@ @node getdelim > This function crashes when passed a pointer to a NULL buffer together with a > pointer to a non-zero buffer size on some platforms: > FreeBSD 8.0. > +@item > +This function does not null terminate the buffer when the first > +character read is EOF on some platforms: > +@c https://sourceware.org/PR28038 > +glibc 2.42. > @end itemize The POSIX wording is a bit unusual - the return of -1 is required both if an error is encountered (the stream's error indicator shall be set and errno reflect the error) and if EOF is encountered (the stream's end-of-stream indicator shall be set, and errno is not). In the latter, it is not obvious whether -1 is supposed to imply an error or a successful return. Usually, POSIX says that the contents of a buffer are indeterminate or else required to be unchanged from entry when an error return pre-empts normal return, but since encountering EOF does not set the stream error indicator, I can see why an argument can be made that -1 is not always an error return, and that when it is not an error return that we should guarantee that the buf is NUL-terminated. It helps to remember that when POSIX added getdelim/getline back in 2008, it was based on glibc as the reference implementation. But it is quite obvious looking at the history over the years that the POSIX wording at the time was not identical to the glibc behavior at the time. So we now have the odd case of arguing that glibc needs to change its behavior to match POSIX, when it was POSIX that was trying to match glibc behavior originally. In this particular case, there is at least one project that observably behaves differently due to the glibc change, and where the workaround in that project was to add a strndup() after each getline() call, for double the malloc() pressure: https://gitlab.com/nbdkit/nbdkit/-/commit/01b8e557ce129b In short, while your patch changes the behavior on an empty file to guarantee that the buffer is NUL-terminated even though it is empty (on the grounds that the -1 return in THAT scenario is not an error, per se, because errno is not set), it ALSO has the side effect of changing the buffer on a non-empty file after the last line is already in the buffer and EOF then encountered to end the loop. Pre-patch, it was possible (although questionably portable, because the POSIX wording is unclear) to call getline() in a loop until it returns -1, and then have the contents of the final line of the file (assuming the file was non-empty) in the buffer with no extra effort. This works great for grabbing the summary line of du(1), for example. Post-patch, glibc now ALWAYS writes buf[0] to \0 on EOF, even if the file was non-empty, which breaks that convenience means of grabbing the last line of du. Is there any way to refine this patch in gnulib and glibc so as to not break the behavior on non-empty files? After all, if the file is non-empty, then by the time we encounter EOF, we are guaranteed that the buffer IS NUL-terminated from the previous loop. It is only when the file is empty that there was no previous line, and therefore no guarantee of a NUL terminator in the buffer. Would it work to change the behavior to add a NUL terminator at the time the buffer is first allocated before reading from the file, and then have EOF leave the buffer unchanged, instead of truncating the buffer with \0 at buf[0] on EOF? -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
