Greetings,

I'm trying to determine exactly what F_PREALLOCATE does and how to effectively use it.

I recently had an F_PREALLOCATE bug report thrown back at me with "this is a bug in your code". I'd like to disagree, and point out that my F_PREALLOCATE call is perfectly valid and it's fcntl() that is not doing what it should. But after drilling down on this, I came to a disturbing realization:

I don't know, exactly, how F_PREALLOCATE works.

Here's the problem:

I have an app that writes to a very large (GB to TB) file. It's essentially a database, consisting of relatively small data and control records. While performing a transaction, it will append a bunch of new data records. To complete the transaction, it must write a modest number of control records to tie together the data records and record the transaction.

If, for any reason, the transaction is interrupted, I want to guarantee that the necessary control records can be written in order to complete the transaction and leave the file in a valid state. One of these interruptions would be, obviously, running out of fee disk space.

Here was my solution:

My "solution" to this was to begin by requesting a small F_PREALLOCATE (say, 2MB) before starting.

After writing some new data records (appending, say, an additional 1MB), I would assume that there is now only 1MB of pre-allocated file space remaining as a safety net. At this point, the code performs another 2MB F_PREALLOCATE so the file again has 2MB of preallocated space to finish its transaction.

This logic repeats, indefinately, until the F_PREALLOCATE returns an out-of-disk-space error, which cancels the transactions and uses the previously preallocated disk space to wrap up.

And here's the can of worms:

This seemed to work just fine on HFS+ (as far as I can tell). Then my APFS customers starting getting weird errors (error 22, invalid parameter) from the F_PREALLOCATE request. So I filed a bug.

Now in trying to defend this bug, I realize I have a lot more questions about F_PREALLOCATE than the documentation (what little there is) addresses.

The F_PREALLOCATE command passes an fstore_t structure with the following fields:

fst_flags: a combination of F_ALLOCATECONTIG (request a "contiguous allocation") and F_ALLOCATEALL ("allocate all of the requested space or fail and allocation nothing"). These seem pretty clear, and I don't use either.

fst_posmode: this must be either F_PEOFPOSMODE ("allocate from the physical eof") or F_VOLPOSMODE ("allocate from volume offset"). I have no idea what the latter means, but since I want additional space past the file's eof to get preallocated I've always used F_PEOFPOSMODE.

Which brings me to my first (and biggest) question: F_PEOFPOSMODE allocates from the "physical" end of file. What is the physical end of file? Let's say I have a 1MB file and request a 2MB preallocation. Afterwards, is the "physical" eof 1MB or 3MB? If I perform another 2MB preallocation will the preallocated space remain at 2MB or will it grow to 4MB? If the latter, how does one determine the "physical" end of file?

fst_offset and fst_length: The offset to the start of the "region" and the length of the preallocation request. I've always assumed (see fst_posmode) that the offset was relative to the file's logical EOF, but now I'm not sure.

fst_bytesalloc: This is the return field that reports the amount of space actually allocated. The documentation says "the space that is allocated can be the same size or larger than the space requested". That always made sense to me. If I requested a paltry 3 bytes, I'm sure the filesystem would round that up at least to the nearest block size.

Problem/question number two: For HFS+ volumes, the fst_bytesalloc returned was always the size I requested (unless the drive was out of space). In APFS, however, I get numbers much smaller than what was requested, even when the return value indicates success, in direct contradiction of the documentation. For example, after performing a few preallocations, the next request might be to preallocate 3MB, but the value returned in fst_bytesalloc will be 20K.

Finally, I've got some new questions as I explore using "holes" and sparse files. Specifically, if I punch a "hole" in a files with F_PUNCHHOLE, can I later use F_PREALLOCATE to re-allocate those blocks before I write into them? If so, how would one determine the offset of a hole when setting up fst_posmode, fst_offset, and fst_length?

I'm just hoping there are fcntl() and/or APFS gurus out there that know the answers to these questions.

James Bucanek

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list      (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to