On 26 March 2012 17:08, David Korn <[email protected]> wrote: > cc: [email protected] > Subject: Re: [ast-users] Implementing SEEK_HOLE, SEEK_DATA in AST cp, mv, pax > -------- > >> are there plans to implement support for SEEK_HOLE (let lseek() seek >> to the next hole in a sparse file) and SEEK_DATA (let lseek() seek to >> the next place with real data, usually after a hole) in AST cp, mv and >> pax in the next 2-3 months? This has become VERY important to >> enterprise customers now that Linux+btrfs, GNU coreutils, Solaris, >> FreeBSD and others support this feature and that it is going to be >> included in the next iteration of the POSIX standard >> (http://man7.org/linux/man-pages/man2/lseek.2.html) >> >> > > The AST tools do not use read/write/lseek directly but use SFIO for all > input and output. Currently when writing a file and there are more than > a block full of 0 bytes, SFIO seeks to the next non-zero byte so that > the file can be created with holes. The result is that cp foo bar will > create the file bar with the same or less space than bar.
This is AFAIK incorrect behaviour. There is a distinctive difference between a sequence of bytes with the value zero and holes in a sparse file. The holes represent an area where NO real data are stored but reading this region will return zero bytes to avoid confusing for applications which are not aware of sparse files. But the difference between 'no data' and 'bytes with value zero' is very significant. For example imagine a single, sparse file 20PB file (20 peta bytes) with valuable research where holes represent areas where no data were collected while areas with zero bytes represent collected data with the value zero. The real world application are databases which create sparse files of almost infinite size since many years and sparse support in commands is thus utterly important. It also has performance implications, a mv/cp/pax which is not aware of sparse files will try to copy the holes as sequence of zero bytes, wasting significant amounts of time. > I don't > understand how mv is effected by SEEK_HOLE and SEEK_DATA since this is > just a rename operation. mv will copy the data if the move is between filesystems. > pax will create holes whenever the input > file contain large numbers of 0 bytes. > > In order to take advantage of SEEK_HOLE and SEEK_DATA, we will have to > allow these to be passed into sfseek(). I don't know what gain will > be achieved over the current method except possibly on read. The gain is that the AST commands will be able to replicate sparse files in a 1:1 manner. Currently AST cp, mv, pax on Linux, FreeBSD and Solaris don't do that and corrupt sparse files or mangle files with large areas of real data represented by zero bytes into sparse files. At least the Oracle database and SAP products don't like it, both either crash or report corrupted files and demand a long repair procedure to undo the damage. Or users will have to use GNU coreutils for all work related to sparse files. _______________________________________________ ast-users mailing list [email protected] https://mailman.research.att.com/mailman/listinfo/ast-users
