[GitHub] [arrow] jvanstraten commented on a change in pull request #12116: ARROW-7051: [C++] Improve MakeArrayOfNull to support creation of multiple arrays

GitBox Mon, 14 Feb 2022 10:56:56 -0800


jvanstraten commented on a change in pull request #12116:
URL: https://github.com/apache/arrow/pull/12116#discussion_r806147259




##########
File path: cpp/src/arrow/memory_pool.cc
##########
@@ -603,14 +643,109 @@ class BaseMemoryPoolImpl : public MemoryPool {
     stats_.UpdateAllocatedBytes(-size);
   }
 
-  void ReleaseUnused() override { Allocator::ReleaseUnused(); }
+ protected:
+  virtual Status AllocateImmutableZeros(int64_t size, uint8_t** out) {
+#ifdef USE_MMAP_FOR_IMMUTABLE_ZEROS
+    if (size > 0) {
+      *out = static_cast<uint8_t*>(mmap(
+          nullptr, size, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS | 
MAP_NORESERVE, -1, 0));
+      if (*out == MAP_FAILED) {
+        auto err = errno;
+        return Status::OutOfMemory("Failed to allocate zero buffer of size ", 
size, ": ",
+                                   strerror(err));
+      }
+      return Status::OK();
+    }
+#endif
+    RETURN_NOT_OK(Allocate(size, out));
+    std::memset(*out, 0, size);
+    return Status::OK();
+  }
+
+  void FreeImmutableZeros(uint8_t* buffer, int64_t size) override {
+#ifdef USE_MMAP_FOR_IMMUTABLE_ZEROS
+    if (size > 0) {
+      munmap(buffer, size);
+      return;
+    }
+#endif
+    Free(buffer, size);
+  }
+
+ public:
+  Result<std::shared_ptr<Buffer>> GetImmutableZeros(int64_t size) override {
+    // Thread-safely get the current largest buffer of zeros.

Review comment:
       Compared to my algorithm:
   
    - `+` In terms of thread primitives, your fast path only involves a memory 
fence, whereas mine involves a mutex. I'm not sure if it's even needed, though: 
it isn't if you can safely make a copy of a `shared_ptr` while another thread 
may be updating the contained pointer. That feels like a true statement, at 
least on x86, but I couldn't figure it out for sure from the C++ docs. If I can 
remove it, my fast path is just a `shared_ptr` copy (so, an atomic increment), 
a null check, and a size check, which I'm pretty sure is the fastest way to do 
it that implements reference counting for deallocation.
    - `+` Your version doesn't allocate unnecessarily small buffers.
    - `+` Your version is more readable, especially seeing how unnecessarily 
cryptic I wrote the reallocation logic.
    - `-` Your version has no way to free buffers, so I would argue that it 
leaks memory. Granted, it's upper-bounded by a bit less than 2x the next larger 
power-of-two of the largest buffer allocated, so it won't grow without bound. 
By comparison however, my version will release smaller buffers when they are no 
longer used, and will free its cache when `ReleaseUnused()` is called and there 
are no other users. I also considered a version where the cache is a 
`weak_ptr`, in which case the `ReleaseUnused()` would not be needed, but 
decided against it mostly because `ReleaseUnused()` already existed.
    - `-` Nit, but your version will allocate small buffers regardless of 
whether a larger buffer is already available, whereas my version will return 
the largest buffer allocated thus far, and will automatically free previously 
allocated smaller buffers when all their users go out of scope.
    - `-` Also kind of a nit, but rounding up to power-of-two-sized buffers 
means that you might throw an out of memory error even if almost half of the 
requested memory isn't needed. My algorithm will back off and allocate only as 
much as is needed if the 2 * previous size allocation fails.
   
   An inability to free something, especially if that something is large, feels 
like bad news to me, so I'm hesitant to just copy your version in and call it a 
day. But if nothing else, I'll add a lower bound for allocation size and try to 
rewrite the allocation algorithm to be less cryptic tomorrow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jvanstraten commented on a change in pull request #12116: ARROW-7051: [C++] Improve MakeArrayOfNull to support creation of multiple arrays

Reply via email to