Re: [RFC] Enabling data frames in disaggregated shared memory

John Groves Fri, 12 Apr 2024 11:54:13 -0700

On 24/04/10 05:56PM, Antoine Pitrou wrote:
> 
> Hello John,
> 
> Arrow IPC files can be backed quite naturally by shared memory, simply by
> memory-mapping them for reading. So if you have some pieces of shared memory
> containing Arrow IPC files, and they are reachable using a filesystem mount
> point, you're pretty much done.
> 
> You can see an example of memory-mapped read in Python at the end of this
> documentation section:
> https://arrow.apache.org/docs/python/ipc.html#efficiently-writing-and-reading-arrow-data



Antoine,

Yes, thanks - we've run some tests using the pyarrow.memory_map()
interface. It works well with famfs.

And please forgive me if this is obvious; I have been learning that
it is pretty arcane and not obvious to everybody...

There is a crucial distinction between a "normal" memory-mapped file
and a shared-memory-mapped file (i.e. famfs).

With a "normal" file system (xfs, btrfs, etc.) memory-mapping a file
creates a virtual address range that maps to the data in the file,
but the memory footprint starts out sparse and data is demand-paged
from the backing media (SSD etc.) at minimum 4K granularity.
System-RAM functions as a memory-mapped cache for the data on the
backing media.

With a shared memory file system (famfs), data is not cached in
system-RAM; the data is already in byte-addressable memory, so
there are never page faults, just memory accesses. Software sees
the "same" behavior, with the performance benefit of no page faults.

Now add the fact that memory may actually be shared across multiple
servers (and concurrently mounted); it still works "the same" as
normal memory-mapped files, but multiple servers are accessing a
single copy of an Arrow frame without putting any of it in
system-RAM. This has the benefit of no page faults, plus the
capacity benefit of no duplication, plus the network thrash
benefit of no shuffling (because the files can be found and
mapped from every node). And since memory-mapping a shared-memory
file uses almost no system-RAM, more is available for other uses.

If the memory is not being mutated (i.e. mapped read-only), there
are no gnarly cache coherency issues. If it *is* being mutated,
there are no *new* issues; it's the same problem set that SW must
manage when mutating shared memory within a single host, although
synchronization techniques get somewhat more expensive in a
shared-memory environment.

I see this as "yet another reason" why the Arrow formats make
sense. It probably wasn't foreseen, but I think there is some
major potential leverage here.

I hope this helps *somebody*, and helps move the conversation
along.


> 
> Note: Arrow IPC files are just a way of storing Arrow columnar data on
> "disk", with enough additional metadata to interpret the data (such as its
> schema).
[snip]

Also a great fit for shared-memory that looks like files!

Thanks for reading,
John Groves
Micron

Re: [RFC] Enabling data frames in disaggregated shared memory

Reply via email to