Greetings,
I'm trying to determine exactly what F_PREALLOCATE does and how to
effectively use it.
I recently had an F_PREALLOCATE bug report thrown back at me with "this
is a bug in your code". I'd like to disagree, and point out that my
F_PREALLOCATE call is perfectly valid and it's fcntl() that is not doing
what it should. But after drilling down on this, I came to a disturbing
realization:
I don't know, exactly, how F_PREALLOCATE works.
Here's the problem:
I have an app that writes to a very large (GB to TB) file. It's
essentially a database, consisting of relatively small data and control
records. While performing a transaction, it will append a bunch of new
data records. To complete the transaction, it must write a modest number
of control records to tie together the data records and record the
transaction.
If, for any reason, the transaction is interrupted, I want to guarantee
that the necessary control records can be written in order to complete
the transaction and leave the file in a valid state. One of these
interruptions would be, obviously, running out of fee disk space.
Here was my solution:
My "solution" to this was to begin by requesting a small F_PREALLOCATE
(say, 2MB) before starting.
After writing some new data records (appending, say, an additional 1MB),
I would assume that there is now only 1MB of pre-allocated file space
remaining as a safety net. At this point, the code performs another 2MB
F_PREALLOCATE so the file again has 2MB of preallocated space to finish
its transaction.
This logic repeats, indefinately, until the F_PREALLOCATE returns an
out-of-disk-space error, which cancels the transactions and uses the
previously preallocated disk space to wrap up.
And here's the can of worms:
This seemed to work just fine on HFS+ (as far as I can tell). Then my
APFS customers starting getting weird errors (error 22, invalid
parameter) from the F_PREALLOCATE request. So I filed a bug.
Now in trying to defend this bug, I realize I have a lot more questions
about F_PREALLOCATE than the documentation (what little there is) addresses.
The F_PREALLOCATE command passes an fstore_t structure with the
following fields:
fst_flags: a combination of F_ALLOCATECONTIG (request a "contiguous
allocation") and F_ALLOCATEALL ("allocate all of the requested space or
fail and allocation nothing"). These seem pretty clear, and I don't use
either.
fst_posmode: this must be either F_PEOFPOSMODE ("allocate from the
physical eof") or F_VOLPOSMODE ("allocate from volume offset"). I have
no idea what the latter means, but since I want additional space past
the file's eof to get preallocated I've always used F_PEOFPOSMODE.
Which brings me to my first (and biggest) question: F_PEOFPOSMODE
allocates from the "physical" end of file. What is the physical end of
file? Let's say I have a 1MB file and request a 2MB preallocation.
Afterwards, is the "physical" eof 1MB or 3MB? If I perform another 2MB
preallocation will the preallocated space remain at 2MB or will it grow
to 4MB? If the latter, how does one determine the "physical" end of file?
fst_offset and fst_length: The offset to the start of the "region" and
the length of the preallocation request. I've always assumed (see
fst_posmode) that the offset was relative to the file's logical EOF, but
now I'm not sure.
fst_bytesalloc: This is the return field that reports the amount of
space actually allocated. The documentation says "the space that is
allocated can be the same size or larger than the space requested". That
always made sense to me. If I requested a paltry 3 bytes, I'm sure the
filesystem would round that up at least to the nearest block size.
Problem/question number two: For HFS+ volumes, the fst_bytesalloc
returned was always the size I requested (unless the drive was out of
space). In APFS, however, I get numbers much smaller than what was
requested, even when the return value indicates success, in direct
contradiction of the documentation. For example, after performing a few
preallocations, the next request might be to preallocate 3MB, but the
value returned in fst_bytesalloc will be 20K.
Finally, I've got some new questions as I explore using "holes" and
sparse files. Specifically, if I punch a "hole" in a files with
F_PUNCHHOLE, can I later use F_PREALLOCATE to re-allocate those blocks
before I write into them? If so, how would one determine the offset of a
hole when setting up fst_posmode, fst_offset, and fst_length?
I'm just hoping there are fcntl() and/or APFS gurus out there that know
the answers to these questions.
James Bucanek
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Filesystem-dev mailing list (Filesystem-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com
This email sent to arch...@mail-archive.com