On 4 Sep 2014, at 09:33, Lionel Cons <[email protected]> wrote:

> On 4 September 2014 03:37, Jeffrey Altman <[email protected]> 
> wrote:
>> On 9/3/2014 1:29 PM, Lionel Cons wrote:
>>> Will AFS3 include support for sparse files, e.g. files which have one
>>> or multiple holes (see POSIX lseek() documentation about SEEK_HOLE and
>>> SEEK_DATA) where no data reside?

I don’t think SEEK_HOLE and SEEK_DATA are in issue 7. There’s text defining 
them proposed as part of Issue 8 - see 
http://austingroupbugs.net/view.php?id=415 for the accepted text.

> There is one very large, and very bad misconception:
> Holes in a sparse file return '\0' (zero) bytes on read(), but it does
> not mean that all data which contain '\0' bytes are holes. This
> misconception can lead to serious data loss situations because
> applications need to be able to differ between holes in a file -
> representing 'no data here' and large ranges containing '\0' bytes -
> which represent valid data containing '\0' bytes.

I’m surprised that applications can actually get away with making these 
differentiations, given the way that sparse files are typically implemented. In 
particular, there’s no portable guarantee of the granularity of a hole. Your 
attempt to leave a 12 byte hole may be rejected by the filesystem, and be 
replaced by 12 zero-bytes, and your attempt to leave a 5000 byte hole may 
become a 4096 byte hole, and 4 zero bytes.

This is particularly true of the mmap use case where writing a single byte is 
likely to result in the whole page that you’ve written to becoming non-sparse. 
On filesystems which manage sparse files using disk blocks, it will probably 
also result in the whole block containing that byte becoming allocated.

What level of granularity do your applications expect to have for sparse holes? 
What interfaces are they using to create the holes themselves - are they just 
doing seek then write, or are they using things like fallocate() with the 
zero_range or punch_hole options?

Cheers,

Simon


Reply via email to