jvanstraten commented on a change in pull request #12116:
URL: https://github.com/apache/arrow/pull/12116#discussion_r809153416
##########
File path: cpp/src/arrow/device.h
##########
@@ -223,4 +236,55 @@ class ARROW_EXPORT CPUMemoryManager : public MemoryManager
{
ARROW_EXPORT
std::shared_ptr<MemoryManager> default_cpu_memory_manager();
+/// A memory manager that uses the immutable zeros interface of the given
memory pool,
+/// rather than the normal mutable buffer interface.
Review comment:
> And to me 'managing' means 'can read the data backed by the buffer',
not necessarily 'will allocate you exactly the same buffer'.
But that's what `Device` represents according to its
[docstring](https://github.com/apache/arrow/blob/74f512260fa69903feac61e1287f6954a3d98204/cpp/src/arrow/device.h#L36-L39)?
In contrast, you can have multiple memory managers per device, expressly
intended to [provide memory management
primitives](https://github.com/apache/arrow/blob/74f512260fa69903feac61e1287f6954a3d98204/cpp/src/arrow/device.h#L81-L85)
for some particular allocation method, such as for some particular memory
pool. The allocation method for immutable zero buffers is fundamentally
different, so I stand by my interpretation of those docstrings that immutable
zero buffers warrant a new memory manager class. However, I certainly can't
vouch for the correctness of those docstrings or the usefulness/applicability
of this layer if defined as such.
Furthermore, if the intention of associating a `MemoryManager` with a buffer
is only to specify which device can actually access it and how (which seems
very sensible to me) rather than also allowing similar buffers to be allocated
(which indeed does not seem very useful to me), the association shouldn't have
been a memory manager but a `Device`. After all, the only other information
provided by `MemoryManager` on top of its associated `Device` is how to
allocate buffers. The rest of its methods should IMO be moved to `Device`. If
`Device` should be kept clean of memory-related stuff, there should be an
intermediate layer called `MemoryRegion` or `AddressSpace` or something that
decouples allocation strategies from address space/the significance and
accessibility of a pointer. In general a device can have multiple address
spaces, anyway (such as NUMA nodes on CPU, different memory banks on an
accelerator, etc.). Whether `MemoryManager` is still relevant on top of
`MemoryPool` at t
hat point is disputable, but I guess the biggest difference is that
`MemoryPool` operates on pointers to bytes, whereas `MemoryManager` yields
buffers that clean themselves up via RAII.
Nevertheless, I feel like this discussion is getting to the point where it
shouldn't be held in a github thread on a mostly unrelated PR (what *would* be
the right place? a JIRA issue?), and in general I don't really have the
bandwidth or the use cases right now to dive further into this rabbit hole. So,
for now, I've just reverted the additions I made relating to `MemoryManager`
stuff, and we (or other people) can delve into this if/when it becomes
relevant. 976dcf6
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]