bug#67490: [PATCH v2] tail: fix tailing sysfs files on large page kernels
On 01/12/2023 01:54, Paul Eggert wrote: On 11/30/23 12:11, Pádraig Brady wrote: Though that will generally give 128K, which is good when processing all of a file, but perhaps overkill when processing just the last part of a file. The 128 KiB number was computed as being better for apps like 'sed' that typically read all or most of the file. 'tail' sometimes behaves that way (e.g., 'tail -c +10') and so 'tail' should use 128 KiB in those cases. The simplest way to do that is for 'tail' to use 128 KiB all the time - that would cost little for uses like plain 'tail' and it could be a significant win for uses like 'tail -c +10'. Yes I agree we should use io_blksize() in other routines in tail where we may dump lots of a file. However in this (most common) case the routine is dealing with the end of a regular file, so it's probably best to somewhat minimize the amount of data read, and more directly check the page_size which is issue at hand. I've pushed the fix at https://github.com/coreutils/coreutils/commit/73d119f4f where the adjustment (which also corresponds to what we do in wc) is: if (sb->st_size % page_size == 0) bufsize = MAX (BUFSIZ, page_size); I'll follow up with another patch to address the performance aspect, which uses io_blksize() where appropriate. (As an aside, the 128 KiB number was computed in 2014. These days 256 KiB might be better if someone could take the time to measure) I periodically check with the documented script, but I should update the comment when I do that. I'll update the date in the comment now at least. I quickly tested a few systems here, which suggests 256KiB _may_ be a more appropriate default now. More testing required. Marking this as done. thanks, Pádraig
bug#67490: [PATCH v2] tail: fix tailing sysfs files on large page kernels
On 11/30/23 12:11, Pádraig Brady wrote: Though that will generally give 128K, which is good when processing all of a file, but perhaps overkill when processing just the last part of a file. The 128 KiB number was computed as being better for apps like 'sed' that typically read all or most of the file. 'tail' sometimes behaves that way (e.g., 'tail -c +10') and so 'tail' should use 128 KiB in those cases. The simplest way to do that is for 'tail' to use 128 KiB all the time - that would cost little for uses like plain 'tail' and it could be a significant win for uses like 'tail -c +10'. (As an aside, the 128 KiB number was computed in 2014. These days 256 KiB might be better if someone could take the time to measure)
bug#67490: [PATCH v2] tail: fix tailing sysfs files on large page kernels
On Thu, Nov 30, 2023 at 08:11:52PM +, Pádraig Brady wrote: > Much clearer thanks. > > On my system: > > $ stat /sys/kernel/profiling > File: /sys/kernel/profiling > Size: 4096Blocks: 0 IO Block: 4096 regular file > > I can easily repro by setting the buffer size < PAGE_SIZE. Oh, clever. > So this patch handles the case where sysfs reports a file is a certain size, > but it isn't really. In that case seeking to anywhere other than the start > doesn't give an error, but reading returns nothing. So we use a buffer size > large enough (>= PAGE_SIZE as inferred from st_blksize) so that we'll be > reading from the start of the file in this case. Exactly. > Note st_blksize can have unusual values, so it might be better > to use the io_blksize() wrapper to sanitize the values. > Though that will generally give 128K, which is good when processing all of a > file, > but perhaps overkill when processing just the last part of a file. > > So perhaps it's better to use buffer_size = MAX (BUFSIZ, getpagesize ()) > and avoid all the st_blksize edge cases. Is there any performance concern about switching to 64K reads for "normal" files (on these systems)? > I'll think about it a little, and make the adjustments. Appreciated! -dann > thanks! > Pádraig.
bug#67490: [PATCH v2] tail: fix tailing sysfs files on large page kernels
Much clearer thanks. On my system: $ stat /sys/kernel/profiling File: /sys/kernel/profiling Size: 4096 Blocks: 0 IO Block: 4096 regular file I can easily repro by setting the buffer size < PAGE_SIZE. So this patch handles the case where sysfs reports a file is a certain size, but it isn't really. In that case seeking to anywhere other than the start doesn't give an error, but reading returns nothing. So we use a buffer size large enough (>= PAGE_SIZE as inferred from st_blksize) so that we'll be reading from the start of the file in this case. Note st_blksize can have unusual values, so it might be better to use the io_blksize() wrapper to sanitize the values. Though that will generally give 128K, which is good when processing all of a file, but perhaps overkill when processing just the last part of a file. So perhaps it's better to use buffer_size = MAX (BUFSIZ, getpagesize ()) and avoid all the st_blksize edge cases. I'll think about it a little, and make the adjustments. thanks! Pádraig.
bug#67490: [PATCH v2] tail: fix tailing sysfs files on large page kernels
Changes from v1: - Fallback to BUFSIZ if fstat fails instead of exiting. - Add a comment block to explain why the blksize is used in file_lines() (which I hope also clarifies why it is not needed elsewhere). - Only use blksize if it is > BUFSIZ (I had regressed to 4K reads, oops). - Update comments to reference bufsize instead of BUFSIZ. - Update subject because /proc is not affected, and it isn't limited to follow mode. - Add a test. * src/tail.c (file_lines): Consider a full blocksize when searching backwards to find sysfs file content on systems with large page sizes. * tests/tail/tail-1-sysfs.sh: Add a new test. * tests/local.mk: Reference the new test. Fixes https://bugs.gnu.org/67490 --- src/tail.c | 41 +++--- tests/local.mk | 1 + tests/tail/tail-1-sysfs.sh | 31 3 files changed, 61 insertions(+), 12 deletions(-) create mode 100755 tests/tail/tail-1-sysfs.sh diff --git a/src/tail.c b/src/tail.c index c45f3b65a..f3219757d 100644 --- a/src/tail.c +++ b/src/tail.c @@ -518,19 +518,32 @@ static bool file_lines (char const *pretty_filename, int fd, uintmax_t n_lines, off_t start_pos, off_t end_pos, uintmax_t *read_pos) { - char buffer[BUFSIZ]; + char *buffer; size_t bytes_read; + blksize_t bufsize = BUFSIZ; off_t pos = end_pos; + bool ok = true; + struct stat stats; if (n_lines == 0) return true; + if (fstat (fd, ) == 0) +/* sysfs files have a blksize == kernel page size, which can be > BUFSIZ. + lseek SEEK_END moves to the next page, which is likely more than + BUFSIZ away from the end of the actual data. Use a buffer large enough + to hold an entire page, so that the window we search for data is large + enough to find it. */ +bufsize = MAX(BUFSIZ, ST_BLKSIZE (stats)); + + buffer = xmalloc (bufsize); + /* Set 'bytes_read' to the size of the last, probably partial, buffer; - 0 < 'bytes_read' <= 'BUFSIZ'. */ - bytes_read = (pos - start_pos) % BUFSIZ; + 0 < 'bytes_read' <= 'bufsize'. */ + bytes_read = (pos - start_pos) % bufsize; if (bytes_read == 0) -bytes_read = BUFSIZ; - /* Make 'pos' a multiple of 'BUFSIZ' (0 if the file is short), so that all +bytes_read = bufsize; + /* Make 'pos' a multiple of 'bufsize' (0 if the file is short), so that all reads will be on block boundaries, which might increase efficiency. */ pos -= bytes_read; xlseek (fd, pos, SEEK_SET, pretty_filename); @@ -538,7 +551,8 @@ file_lines (char const *pretty_filename, int fd, uintmax_t n_lines, if (bytes_read == SAFE_READ_ERROR) { error (0, errno, _("error reading %s"), quoteaf (pretty_filename)); - return false; + ok = false; + goto free_buffer; } *read_pos = pos + bytes_read; @@ -565,7 +579,7 @@ file_lines (char const *pretty_filename, int fd, uintmax_t n_lines, xwrite_stdout (nl + 1, bytes_read - (n + 1)); *read_pos += dump_remainder (false, pretty_filename, fd, end_pos - (pos + bytes_read)); - return true; + goto free_buffer; } } @@ -577,23 +591,26 @@ file_lines (char const *pretty_filename, int fd, uintmax_t n_lines, xlseek (fd, start_pos, SEEK_SET, pretty_filename); *read_pos = start_pos + dump_remainder (false, pretty_filename, fd, end_pos); - return true; + goto free_buffer; } - pos -= BUFSIZ; + pos -= bufsize; xlseek (fd, pos, SEEK_SET, pretty_filename); - bytes_read = safe_read (fd, buffer, BUFSIZ); + bytes_read = safe_read (fd, buffer, bufsize); if (bytes_read == SAFE_READ_ERROR) { error (0, errno, _("error reading %s"), quoteaf (pretty_filename)); - return false; + ok = false; + goto free_buffer; } *read_pos = pos + bytes_read; } while (bytes_read > 0); - return true; +free_buffer: + free (buffer); + return ok; } /* Print the last N_LINES lines from the end of the standard input, diff --git a/tests/local.mk b/tests/local.mk index a5fb62d96..af41337b1 100644 --- a/tests/local.mk +++ b/tests/local.mk @@ -257,6 +257,7 @@ all_tests = \ tests/seq/seq-precision.sh \ tests/head/head.pl \ tests/head/head-elide-tail.pl\ + tests/tail/tail-1-sysfs.sh \ tests/tail/tail-n0f.sh \ tests/ls/ls-misc.pl \ tests/date/date.pl \ diff --git a/tests/tail/tail-1-sysfs.sh b/tests/tail/tail-1-sysfs.sh new file mode 100755 index 0..078366588 --- /dev/null +++ b/tests/tail/tail-1-sysfs.sh @@ -0,0 +1,31 @@ +#!/bin/sh +# sysfs files have weird