http://lwn.net/Articles/260795/SEEK_HOLE or FIEMAP?The process of recognizing holes is relatively primitive, though: about the only way to do it in a portable way is to simply look for blocks filled with zeroes. This technique works, but it requires making a pass over the data to obtain information which the lower levels of the system already know. It seems like there should be a better way. About two years ago, the Solaris ZFS developers proposed an extension to lseek() which would allow an application to find the holes in sparse files more efficiently. This extension works by adding two new "whence" options:
This functionality has been part of Solaris for a while; the Solaris developers would like to see it spread elsewhere and become something more than a Solaris-only extension. To that end, Josef Bacik has recently posted an implementation of this extension for Linux. Internally, it adds a new member to the file_operations structure (seek_hole_data()) intended to allow filesystems to efficiently implement the new operations. One might argue that anybody who wants to separate holes and data in a file can already do so with the FIBMAP ioctl() command. While that is true, FIBMAP is an inefficient way of getting this sort of information, especially on filesystems which support extents. A FIBMAP call returns the mapping information for exactly one block; mapping out a large file may require millions of calls when, once again, the filesystem should already know how to provide that information in a much more straightforward manner. Even so, this patch looks relatively unlikely to make it into the mainline. The API is unpopular, being seen as ugly and as a change in the semantics of the lseek() call. But, more to the point, it may be interesting to learn much more about the representation of a file than just where the holes are. And, as it turns out, there is already a proposed ioctl() command which can provide all of that information. That interface is the FIEMAP ioctl() specified by Andreas Dilger back in October. A FIEMAP call takes the following structure as an argument: struct fiemap { __u64 fm_start; /* logical starting byte offset (in/out) */ __u64 fm_length; /* logical length of map (in/out) */ __u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */ __u32 fm_extent_count; /* number of extents in fm_extents (in/out) */ __u64 fm_end_offset; /* end of mapping in last ioctl */ struct fiemap_extent fm_extents[0]; }; An application wanting to learn something about how a file is stored will put the starting offset into fm_start and the length of the region of interest in fm_length. If fm_flags contains FIEMAP_FLAG_NUM_EXTENTS, the system call will simply set fm_extent_count to the number of extents used to store the specified range of bytes and return. In this form, FIEMAP can be used to determine how fragmented the file is on disk. If the application is looking for more information than that, it will allocate enough space for one or more fm_extents structures: struct fiemap_extent { __u64 fe_offset;/* offset in bytes for the start of the extent */ __u64 fe_length;/* length in bytes for the extent */ __u32 fe_flags; /* returned FIEMAP_EXTENT_* flags for the extent */ __u32 fe_lun; /* logical device number for extent(starting at 0)*/ }; In this case, fm_extent_count should be set to the number of these structures before making the FIEMAP call. On return, these structures (as many as is indicated by the returned value of fm_extent_count) will be filled in with information on the actual file extents; fe_offset says where (on disk) the extent starts, and fe_length is the size of the extent. There are quite a few values which can appear in the fe_flags field:
As can be seen, there is a wealth of information available from this new call, including details on how the file has been split up on disk, allocation strategies, and even the decisions made by a hierarchical storage engine. An implementation exists for the ext4 filesystem. None of this code has been pushed toward the mainline yet, but it would be surprising if that did not happen sometime in the relatively near future. Once that is done, the C library will be able to implement SEEK_HOLE and SEEK_DATA in user space, should that be desirable. (Log in to post comments)
SEEK_HOLE or FIEMAP? Posted Dec 6, 2007 15:10 UTC (Thu) by etienne_lorr...@yahoo.fr (subscriber, #38022) [Link] > One might argue that anybody who wants to separate holes and data in > a file can already do so with the FIBMAP ioctl() command. Such an implementation at: http://www.mirrorservice.org/sites/download.sourceforge.n... executable at: http://www.mirrorservice.org/sites/download.sourceforge.n... That can be used to count the level of fragmentation of a filesystem, with some interresting results. The main problem is that some filesystems do not implement it correctly or at all (so LILO or Gujin cannot be installed on them). The other problem, for the case of a bootloader, is that it does not give the position of the data in the disk but in the device, and there is a big difference when the device is a RAID or LVM. The thing the bootloader has to do is to register where its own code/data are on disk to be able to load them without the kernel support, and to have only one file, to write the position of the end of the file at the beginning of itself, so to have block allocated to disk before the write into the file is finished - possible but tricky.
SEEK_HOLE or FIEMAP? Posted Dec 7, 2007 1:55 UTC (Fri) by giraffedata (subscriber, #1954) [Link] It's much cleaner to have the boot loader use the proper directories, block maps, etc. to access the filesystem. GRUB does this. In its usual deployment, GRUB still has the problem because that code that knows how to access the filesystem is in the filesystem, and the only way GRUB knows to find it is with built-in block numbers. But it's possible to put that code outside the filesystem, in an area of disk reserved for that purpose, and then the world is as it should be. You don't need any special kernel interfaces at boot loader installation time, and you don't have to take care to keep the blocks from moving after you've installed the boot loader.
SEEK_HOLE or FIEMAP? Posted Dec 7, 2007 10:25 UTC (Fri) by etienne_lorr...@yahoo.fr (subscriber, #38022) [Link] <rant> > It's much cleaner to have the boot loader use the proper directories, block maps, etc. to access the filesystem. GRUB does this. So does Gujin - smaller number of filesystem supported, I have to say. > In its usual deployment, GRUB still has the problem because that code that knows how to access the filesystem is in the filesystem, and the only way GRUB knows to find it is with built-in block numbers. So does Gujin. > But it's possible to put that code outside the filesystem, in an area of disk reserved for that purpose, and then the world is as it should be. By default Gujin puts that code at the end of the disk, outside of any filesystem, but it not always available depending on the tool used to create the partitions (Linux tools are used to fill the whole disk - not leaving a single unallocated sector for the bootloader). </rant> Doesn't change that it would be nice to have a kernel interface which maps the device block into a hard disk block, for that part of the bootloader which shall not move when it is on a filesystem (RAID and LVM problem). It would also be nice to have an interface to tell the filesystem that this file is the boot code - there is an inode reserved for that in EXT2/3FS but no way to use it.
SEEK_HOLE or FIEMAP? Posted Dec 13, 2007 13:28 UTC (Thu) by RobLucid (guest, #49530) [Link] Wonder why good ole' partitions are out of fashion? Rather than having the ability for applications to do nasty things and become dependant on physical block numbers, which prevent copying of files around. You could use a raw partition, and then copy the blocks into known offsets from the beginning of the partition. This seems much simpler. Presumbably a BootFS, with a boot loader friendly structure might also be a robust alternative and avoid duplication, of files in the raw partition approach. |