UPDATE: 1. Interestingly, ftruncate() works fine on macOS Catalina. Were there some fixes for this in Catalina? 2. I found an open issue for missing posix_fallocate(): https://openradar.appspot.com/32720223
On Thu, Dec 19, 2019 at 4:32 PM Ilia K <ki.s...@gmail.com> wrote: > Hi! > > I investigate performance issues with our test cases on Mac mini (2018, > Core i7 3.2GHz, 16GB RAM) with macOS Mojave 10.14.6. > > Our storage uses memory mappings backed by file, and periodically when it > gets too big we increase file size using the corresponding function: > CreateFileMapping on Windows, posix_fallocate on Linux. On macOS, we > emulate posix_fallocate() which simply does ftruncate(). > > It so happened that one of our test cases repeatedly allocates and > mmap()'s chunks of size >= 4KB, without reading/writing to them. (btw, page > size and block size are also 4K). > > The problem is that sometimes we have unpredictable delays in page faults: > from tens of seconds to minutes. Usually it happens when accessing the > mmap()'ed addresses with offset ~2050-2090MB. > > Well, I tried to implement posix_fallocate() the different ways: > * ftruncate() -- the easist one, works both on Linux and on macOS with > HFS+. But on APFS page fault takes about 23 seconds. > * fcntl(F_SETSIZE) -- the worst page fault time is less than for > ftruncate() (only 11 seconds), and we need root privilege. > * fcntl(F_PREALLOCATE) -- I found 3 ways of using it in various open > source projects: #1 & #2 seems wrong to me (see the comments in my demo for > details), and #3 can cause a page fault lasting 10 minutes. > * pwrite() -- works slow but without obscenely long page faults if step > size 4K. Otherwise, we can also wait in pwrite() for 12 seconds, or get a > 13 seconds page fault. > > Here is my posix_fallocate(), the full demo code is in the attachments > (pagefault_test.c): > ```C > int posix_fallocate(int fd, off_t offset, off_t len) { > struct stat stat_buf; > if (flock(fd, LOCK_EX) != 0) return errno; > > int err_code = fstat(fd, &stat_buf) == 0 ? 0 : errno; > if (err_code == 0 && offset + len > stat_buf.st_size) { > #if defined(IMPL_FTRUNCATE) > err_code = ftruncate(fd, offset + len) == 0 ? 0 : errno; > // btw, LLVM simply uses ftruncate when posix_fallocate not > available: > https://github.com/llvm/llvm-project/blob/b462cdff05b82071190e8bfd1078a2c76933b19b/llvm/lib/Support/Unix/Path.inc#L559 > . > #elif defined(IMPL_FCNTL_SETSIZE) > unsigned long long arg = offset + len; > err_code = fcntl(fd, F_SETSIZE, &arg) != -1 ? 0 : errno; > #elif defined(IMPL_FCNTL_PREALLOCATE) > // I found several ways to use F_PREALLOCATE (uncomment to try it): > // 1. Starting from specific offset. This way is used in Chromium > ( > https://chromium.googlesource.com/chromium/src/+/7ca4a2b489b1dd4b5c9b0046d55193b900da06ea/base/files/file_util_posix.cc#901), > fallocate module for Python ( > https://github.com/trbs/fallocate/blob/9d7aae312ad0d1de6c6451193748e8e8c7e8230d/fallocate/_fallocatemodule.c#L59 > ), > // but I get EINVAL if .fst_offset != 0. > //fstore_t store = { F_ALLOCATEALL, F_PEOFPOSMODE, offset, len }; > // > // 2. Specifying the desired file size. Examples: Mozilla > https://hg.mozilla.org/mozilla-central/file/3d846420a907/xpcom/glue/FileUtils.cpp#l61 > (copies here: > https://github.com/mozilla/universal-search-gecko-dev/blob/33e34ae066dbdb35ff6889973e21a38792991f35/xpcom/glue/FileUtils.cpp, > > https://github.com/mozilla/integration-mozilla-inbound/blob/0d01aa29ce350beca861f7d3b7b4df399b246ed0/xpcom/glue/FileUtils.cpp), > one guy there https://forums.developer.apple.com/thread/111312, Rust fs2 > https://docs.rs/crate/fs2/0.4.3/source/src/unix.rs. > // But as I can see, this will allocate offset + len bytes per > call, despite how big the file is at the moment. > //fstore_t store = { F_ALLOCATEALL, F_PEOFPOSMODE, 0, offset + len > }; > // > // 3. Specifying the diff file size. The only example I found is > Realm core > https://github.com/realm/realm-core/blob/44152d283878473db8cbf90ac4083dcae44c1852/src/realm/util/file.cpp#L783 > . > // Unfortunately in this case page fault can take longer than with > ftruncate(): up to 611 seconds!!! > // ``` > // $ ./a.out > // map 2179727360-2179792896 > // page fault took 611276 milliseconds > // map 2181300224-2181365760 > // page fault took 214747 milliseconds > // ... > // ``` > fstore_t store = { F_ALLOCATEALL, F_PEOFPOSMODE, 0, offset + len - > stat_buf.st_size }; > > err_code = fcntl(fd, F_PREALLOCATE, &store) != -1 && ftruncate(fd, > offset + len) != -1 ? 0 : errno; > #elif defined(IMPL_WRITE) > // for 64K: pwrite() can take about 12 seconds, or we can get a 13 > seconds page fault. > int step = 65536; > //int step = stat_buf.st_blksize; > > assert(stat_buf.st_size % step == 0); // precondition for this > program > assert((offset + len) % step == 0); > printf("\n"); > for (off_t ofs = stat_buf.st_size; ofs < offset + len; ofs += > step) { > static const char pad = '\0'; > fprintf(stdout, "writing %lld, step %d\r", ofs + step - 1, > step); fflush(stdout); > if (pwrite(fd, &pad, 1, ofs + step - 1) == -1) { > err_code = errno; > break; > } > } > printf("\n"); > #else > #error Select impl > #endif > } > > if (flock(fd, LOCK_UN) != 0) return errno; > return err_code; > } > ``` > > -- > - Ilia > -- - Ilia
_______________________________________________ Do not post admin requests to the list. They will be ignored. Filesystem-dev mailing list (Filesystem-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/filesystem-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com