jvanstraten commented on a change in pull request #12116:
URL: https://github.com/apache/arrow/pull/12116#discussion_r806147259
##########
File path: cpp/src/arrow/memory_pool.cc
##########
@@ -603,14 +643,109 @@ class BaseMemoryPoolImpl : public MemoryPool {
stats_.UpdateAllocatedBytes(-size);
}
- void ReleaseUnused() override { Allocator::ReleaseUnused(); }
+ protected:
+ virtual Status AllocateImmutableZeros(int64_t size, uint8_t** out) {
+#ifdef USE_MMAP_FOR_IMMUTABLE_ZEROS
+ if (size > 0) {
+ *out = static_cast<uint8_t*>(mmap(
+ nullptr, size, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS |
MAP_NORESERVE, -1, 0));
+ if (*out == MAP_FAILED) {
+ auto err = errno;
+ return Status::OutOfMemory("Failed to allocate zero buffer of size ",
size, ": ",
+ strerror(err));
+ }
+ return Status::OK();
+ }
+#endif
+ RETURN_NOT_OK(Allocate(size, out));
+ std::memset(*out, 0, size);
+ return Status::OK();
+ }
+
+ void FreeImmutableZeros(uint8_t* buffer, int64_t size) override {
+#ifdef USE_MMAP_FOR_IMMUTABLE_ZEROS
+ if (size > 0) {
+ munmap(buffer, size);
+ return;
+ }
+#endif
+ Free(buffer, size);
+ }
+
+ public:
+ Result<std::shared_ptr<Buffer>> GetImmutableZeros(int64_t size) override {
+ // Thread-safely get the current largest buffer of zeros.
Review comment:
Compared to my algorithm:
- `+` In terms of thread primitives, your fast path only involves a memory
fence, whereas mine involves a mutex. I'm not sure if it's even needed, though:
it isn't if you can safely make a copy of a `shared_ptr` while another thread
may be updating the contained pointer. That feels like a true statement, at
least on x86, but I couldn't figure it out for sure from the C++ docs. If I can
remove it, my fast path is just a `shared_ptr` copy (so, an atomic increment),
a null check, and a size check, which I'm pretty sure is the fastest way to do
it that implements reference counting for deallocation.
- `+` Your version doesn't allocate unnecessarily small buffers.
- `+` Your version is more readable, especially seeing how unnecessarily
cryptic I wrote the reallocation logic.
- `-` Your version has no way to free buffers, so I would argue that it
leaks memory. Granted, it's upper-bounded by a bit less than 2x the next larger
power-of-two of the largest buffer allocated, so it won't grow without bound.
By comparison however, my version will release smaller buffers when they are no
longer used, and will free its cache when `ReleaseUnused()` is called and there
are no other users. I also considered a version where the cache is a
`weak_ptr`, in which case the `ReleaseUnused()` would not be needed, but
decided against it mostly because `ReleaseUnused()` already existed.
- `-` Nit, but your version will allocate small buffers regardless of
whether a larger buffer is already available, whereas my version will return
the largest buffer allocated thus far, and will automatically free previously
allocated smaller buffers when all their users go out of scope.
- `-` Also kind of a nit, but rounding up to power-of-two-sized buffers
means that you might throw an out of memory error even if almost half of the
requested memory isn't needed. My algorithm will back off and allocate only as
much as is needed if the 2 * previous size allocation fails.
An inability to free something, especially if that something is large, feels
like bad news to me, so I'm hesitant to just copy your version in and call it a
day. But if nothing else, I'll add a lower bound for allocation size and try to
rewrite the allocation algorithm to be less cryptic tomorrow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]