zeroshade commented on code in PR #41180:
URL: https://github.com/apache/arrow/pull/41180#discussion_r1567502147


##########
docs/source/format/DissociatedIPC.rst:
##########
@@ -0,0 +1,335 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. _dissociated-ipc:
+
+========================
+Dissociated IPC Protocol
+========================
+
+.. warning::
+
+    Experimental: The Dissociated IPC Protocol is experimental in its current
+    form. Based on feedback and usage the protocol definition may change until
+    it is fully standardized.
+
+Rationale
+=========
+
+The :ref:`Arrow IPC format <format-ipc>` describes a protocol for transferring
+Arrow data as a stream of record batches. This protocol expects a continuous
+stream of bytes divided into discrete messages (using a length prefix and
+continuation indicator). Each discrete message consists of two portions:
+
+* A `Flatbuffers`_ header message
+* A series of bytes consisting of the flattened and packed body buffers (some
+  message types, like Schema messages, do not have this section)
+  - This is referred to as the *message body* in the IPC format spec.
+
+For most cases, the existing IPC format as it currently exists is extremely 
efficient:
+
+* Receiving data in the IPC format allows zero-copy utilization of the body
+  buffer bytes, no deserialization is required to form Arrow Arrays
+* An IPC (Feather) file can be memory-mapped because it is location agnostic
+  and the bytes of the file are exactly what is expected in memory.
+
+However, there are use cases that aren't handled by this:
+
+* Constructing the IPC record batch message requires allocating a contiguous
+  chunk of bytes and copying all of the data buffers into it, packed together
+  back-to-back. It's exceedingly difficult to zero-copy **create** IPC 
messages.
+* If the Arrow data is located in a shared-memory location, there is no 
standard
+  way to share the handle to the shared-memory across processes or transports 
that
+  allow for remote memory accessing.
+* Arrow data located on a non-CPU device (such as a GPU) cannot be sent using
+  Arrow IPC without having to copy the data back to the host device or copying
+  the flatbuffer metadata bytes into device memory.
+  - By the same token, receiving IPC messages into device memory would require
+    performing a copy of the flatbuffer metadata back to the host CPU device. 
This
+    is due to the fact that the IPC stream interleaves data and metadata 
across a
+    single stream.
+
+This protocol is intended to attempt to solve these use cases in an efficient 
manner.
+
+Goals
+-----
+
+* Define a generic protocol for passing Arrow IPC data, not tied to any 
particular
+  transport, that also allows for utilizing non-CPU device memory, shared 
memory, and
+  newer "high performance" transports such as `ucx`_ or `libfabric`_.
+* Allow for using :ref:`Flight RPC <flight-rpc>` purely for control flow by 
separating
+  the stream of IPC metadata from IPC body bytes
+  - This allows for the data in the body to be kept on non-CPU devices (like 
GPUs)
+    without expensive Device -> Host copies.
+
+Definitions
+-----------
+
+.. glossary::

Review Comment:
   hmm, okay. I was using it for the formatting mostly, but that's fine. I can 
change this to use something else instead of "glossary".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to