On 07/30/2012 12:05 PM, Jim Meyering wrote: > Paul Eggert wrote: >> On further thought, the heuristic is also incorrect for file >> systems that compress their data. So I installed this further >> patch. >> >> Oh, well. At least the code is simpler now. Simple and slow >> is better than complicated and fast and occasionally wrong. > ... >> Subject: [PATCH] grep: don't falsely report compressed text files as binary >> >> * NEWS: Document this. >> * src/main.c (file_is_binary): Remove the heuristic based on >> st_blocks, as it does not work for compressed file systems. >> On Solaris, it'd be cheap to test whether the file system is known >> to be uncompressed, which allow the heuristic, but Solaris has >> SEEK_HOLE so there's little point. > > Hi Paul, > > Without the st_blocks-based heuristic, grep's big-hole test now fails > (exhausts memory and exits with status 2) on an ext4 file system with > a recent linux kernel. > That happens because while SEEK_HOLE and SEEK_DATA are now defined, > the kernel's ext4 lseek/SEEK_HOLE support is just a stub that simply > returns the length of the file.
Does FIEMAP give any better answer for ext4, while waiting for newer kernels to properly implement SEEK_HOLE? This adds yet another argument to why the kernel should give us an interface for quickly detecting whether a file is sparse; I wonder if the proposed xstat() would be such an interface, and what the status of that is. I will be speaking at the Linux Plumbers Conference in San Diego in one month; and I will bring up this topic as one of my concerns on how the kernel folks can make life easier for applications dealing with sparse files. http://summit.linuxplumbersconf.org/lpc-2012/meeting/33/lpc2012-ref-improved-virt-disk-handling/ -- Eric Blake [email protected] +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
