westonpace commented on issue #35394:
URL: https://github.com/apache/arrow/issues/35394#issuecomment-1540512056

   The Arrow spec defines two things:
    * Instructions on how to layout data in buffers
    * Instructions on how to write out all the metadata in the IPC format
   
   When we go from IPC -> C++ we keep the buffers identical (this is what is 
meant by "zero-copy").  However, the metadata is converted from flatbuffers to 
C++ objects (we generally don't consider the metadata when we say "zero-copy"). 
 For example, the flatbuffers "Schema" table (defined here 
https://github.com/apache/arrow/blob/18c976048bc989cf9d2c31139b67f7cc8e143d66/format/Schema.fbs#L517)
 becomes the Arrow-C++ `arrow::Schema` object (which has, for example, 
`std::vector`).  A `pyarrow.Schema` object then has (via cython) a 
`std::shared_ptr<arrow::Schema>`.
   
   So there are a few options:
   
    * If you just need the buffers you can easily get them with `pyarrow` (e.g. 
`pa.array([1, 2, 3]).buffers()[1].to_pybytes()`).  The contents of these 
buffers are stable and defined by the Arrow spec.
    * Serialize to the IPC format (e.g. `pa.ipc.RecordBatchStreamWriter`).  The 
contents are stable and defined by the Arrow IPC spec.
    * Serialize to the C data format.  The contents are stable and defined by 
the Arrow C Data spec.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to