Jim Meyering wrote: > Paul Eggert wrote: >> On further thought, the heuristic is also incorrect for file >> systems that compress their data. So I installed this further >> patch. >> >> Oh, well. At least the code is simpler now. Simple and slow >> is better than complicated and fast and occasionally wrong. > ... >> Subject: [PATCH] grep: don't falsely report compressed text files as binary >> >> * NEWS: Document this. >> * src/main.c (file_is_binary): Remove the heuristic based on >> st_blocks, as it does not work for compressed file systems. >> On Solaris, it'd be cheap to test whether the file system is known >> to be uncompressed, which allow the heuristic, but Solaris has >> SEEK_HOLE so there's little point. > > Hi Paul, > > Without the st_blocks-based heuristic, grep's big-hole test now fails > (exhausts memory and exits with status 2) on an ext4 file system with > a recent linux kernel. > That happens because while SEEK_HOLE and SEEK_DATA are now defined, > the kernel's ext4 lseek/SEEK_HOLE support is just a stub that simply > returns the length of the file. > > For the record, the SEEK_HOLE support for btrfs and xfs in > linux-3.4.4 (F17) works the way I would expect, and it looks > like ocfs2 is fine, too. > > Here's a demo: > > SEEK_HOLE works (detects the hole) with btrfs (SEEK_HOLE == 4): > > $ perl -e '$f=*STDERR; sysseek($f,2**22,0); syswrite($f,"a");' \ > -e 'print 0+sysseek($f,0,4)' 2> j; stat -f --fo=\ %T . > 0 btrfs > > SEEK_HOLE is not usable (reports "hole" at EOF) with ext4: > stat -f report ext2/ext3, but that's only looking at the magic number. > It's really ext4: > > $ perl -e '$f=*STDERR; sysseek($f,2**22,0); syswrite($f,"a");' \ > -e 'print 0+sysseek($f,0,4)' 2> j; stat -f --fo=\ %T . > 4194305 ext2/ext3 > > tmpfs uses the same code, > > 4194305 tmpfs
A quick update: At least with recent linux kernels (3.5.0+), tmpfs now does have SEEK_HOLE support. Confirmed on fedora rawhide. Thanks to Jeff Layton for the tip.
