Unravelling the mysteries of F_PREALLOCATE (on APFS and HFS+)

James Bucanek Wed, 13 Dec 2017 10:30:50 -0800

Greetings,

I'm trying to determine exactly what F_PREALLOCATE does and how toeffectively use it.

I recently had an F_PREALLOCATE bug report thrown back at me with "thisis a bug in your code". I'd like to disagree, and point out that myF_PREALLOCATE call is perfectly valid and it's fcntl() that is not doingwhat it should. But after drilling down on this, I came to a disturbingrealization:


I don't know, exactly, how F_PREALLOCATE works.

Here's the problem:

I have an app that writes to a very large (GB to TB) file. It'sessentially a database, consisting of relatively small data and controlrecords. While performing a transaction, it will append a bunch of newdata records. To complete the transaction, it must write a modest numberof control records to tie together the data records and record thetransaction.

If, for any reason, the transaction is interrupted, I want to guaranteethat the necessary control records can be written in order to completethe transaction and leave the file in a valid state. One of theseinterruptions would be, obviously, running out of fee disk space.


Here was my solution:

My "solution" to this was to begin by requesting a small F_PREALLOCATE(say, 2MB) before starting.

After writing some new data records (appending, say, an additional 1MB),I would assume that there is now only 1MB of pre-allocated file spaceremaining as a safety net. At this point, the code performs another 2MBF_PREALLOCATE so the file again has 2MB of preallocated space to finishits transaction.

This logic repeats, indefinately, until the F_PREALLOCATE returns anout-of-disk-space error, which cancels the transactions and uses thepreviously preallocated disk space to wrap up.


And here's the can of worms:

This seemed to work just fine on HFS+ (as far as I can tell). Then myAPFS customers starting getting weird errors (error 22, invalidparameter) from the F_PREALLOCATE request. So I filed a bug.

Now in trying to defend this bug, I realize I have a lot more questionsabout F_PREALLOCATE than the documentation (what little there is) addresses.

The F_PREALLOCATE command passes an fstore_t structure with thefollowing fields:

fst_flags: a combination of F_ALLOCATECONTIG (request a "contiguousallocation") and F_ALLOCATEALL ("allocate all of the requested space orfail and allocation nothing"). These seem pretty clear, and I don't useeither.

fst_posmode: this must be either F_PEOFPOSMODE ("allocate from thephysical eof") or F_VOLPOSMODE ("allocate from volume offset"). I haveno idea what the latter means, but since I want additional space pastthe file's eof to get preallocated I've always used F_PEOFPOSMODE.

Which brings me to my first (and biggest) question: F_PEOFPOSMODEallocates from the "physical" end of file. What is the physical end offile? Let's say I have a 1MB file and request a 2MB preallocation.Afterwards, is the "physical" eof 1MB or 3MB? If I perform another 2MBpreallocation will the preallocated space remain at 2MB or will it growto 4MB? If the latter, how does one determine the "physical" end of file?

fst_offset and fst_length: The offset to the start of the "region" andthe length of the preallocation request. I've always assumed (seefst_posmode) that the offset was relative to the file's logical EOF, butnow I'm not sure.

fst_bytesalloc: This is the return field that reports the amount ofspace actually allocated. The documentation says "the space that isallocated can be the same size or larger than the space requested". Thatalways made sense to me. If I requested a paltry 3 bytes, I'm sure thefilesystem would round that up at least to the nearest block size.

Problem/question number two: For HFS+ volumes, the fst_bytesallocreturned was always the size I requested (unless the drive was out ofspace). In APFS, however, I get numbers much smaller than what wasrequested, even when the return value indicates success, in directcontradiction of the documentation. For example, after performing a fewpreallocations, the next request might be to preallocate 3MB, but thevalue returned in fst_bytesalloc will be 20K.

Finally, I've got some new questions as I explore using "holes" andsparse files. Specifically, if I punch a "hole" in a files withF_PUNCHHOLE, can I later use F_PREALLOCATE to re-allocate those blocksbefore I write into them? If so, how would one determine the offset of ahole when setting up fst_posmode, fst_offset, and fst_length?

I'm just hoping there are fcntl() and/or APFS gurus out there that knowthe answers to these questions.


James Bucanek

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list      (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Unravelling the mysteries of F_PREALLOCATE (on APFS and HFS+)

Reply via email to