Paul Eggert wrote: > On 07/30/2012 12:33 PM, Jim Meyering wrote: > >> - the interface is cumbersome (putting it mildly) > > Yes, and I remember that FIEMAP had some real bugs when the > data structure on disk didn't match the data structure in > memory. Dunno if they're fixed. Even if they are fixed, > I'd reeeally rather just deal with SEEK_HOLE -- it's a > *much* nicer interface. > >> it may be enough to use the old heuristic, but treat a file >> as non-sparse when it has st.st_size <= ST_BLKSIZE(st). > > That would mishandle compressed file systems. Say the file is > 5 MB of text, but file system compression squashes it down to 1 MB. > Then st_size is 5 MB whereas st_blocks is just 1 MB, > and grep would incorrectly think that the file has a hole > and therefore is a binary file. > > Since the test is marked as expensive, how about if we just > leave things as-is? Most people don't run expensive tests, > and people who run them on inadequate file systems and with > inadequate kernels that can't do 'ulimit -v' will just have > to watch out (or buy machines with 10 TB of RAM ...).
:-) I think that for now, at least with ext2, ext3, ext4 and tmpfs, grep can resort to a file system type check (cached per-device statvfs.f_fsid). Hmm.. maybe better to test "is local_fs and ! is_compressing_fs_type(f_fsid)" since there aren't many of those, while we'll probably want to use the heuristic also for FAT*, NTFS, HFS, etc. Given the knowledge that we're using one of those non-compressing file systems, the legacy heuristic will work. Otherwise, I find it too onerous to search a hierarchy and watch grep appear to hang while it consumes all virtual memory -- only to die (exit 2 or OOM-kill), interrupting the search. >> The arguments for switching from ext4 to btrfs are adding up... > > I rely on you for notes from the bleeding edge.... I would have switched a year or so ago if ext4 weren't so much faster when e.g., removing many small files.
