On 4 Sep 2014, at 09:33, Lionel Cons <[email protected]> wrote:
> On 4 September 2014 03:37, Jeffrey Altman <[email protected]> > wrote: >> On 9/3/2014 1:29 PM, Lionel Cons wrote: >>> Will AFS3 include support for sparse files, e.g. files which have one >>> or multiple holes (see POSIX lseek() documentation about SEEK_HOLE and >>> SEEK_DATA) where no data reside? I don’t think SEEK_HOLE and SEEK_DATA are in issue 7. There’s text defining them proposed as part of Issue 8 - see http://austingroupbugs.net/view.php?id=415 for the accepted text. > There is one very large, and very bad misconception: > Holes in a sparse file return '\0' (zero) bytes on read(), but it does > not mean that all data which contain '\0' bytes are holes. This > misconception can lead to serious data loss situations because > applications need to be able to differ between holes in a file - > representing 'no data here' and large ranges containing '\0' bytes - > which represent valid data containing '\0' bytes. I’m surprised that applications can actually get away with making these differentiations, given the way that sparse files are typically implemented. In particular, there’s no portable guarantee of the granularity of a hole. Your attempt to leave a 12 byte hole may be rejected by the filesystem, and be replaced by 12 zero-bytes, and your attempt to leave a 5000 byte hole may become a 4096 byte hole, and 4 zero bytes. This is particularly true of the mmap use case where writing a single byte is likely to result in the whole page that you’ve written to becoming non-sparse. On filesystems which manage sparse files using disk blocks, it will probably also result in the whole block containing that byte becoming allocated. What level of granularity do your applications expect to have for sparse holes? What interfaces are they using to create the holes themselves - are they just doing seek then write, or are they using things like fallocate() with the zero_range or punch_hole options? Cheers, Simon
