pbower commented on issue #11239:
URL: https://github.com/apache/arrow/issues/11239#issuecomment-2122356170

   Hi,
   
   I have a general question regarding Pyarrow, and this seems to be the best 
place to discuss it.
   
   Isn't the whole point of Pyarrow to be a leading data interchange format? If 
we're picking objects and sending them over the network, doesn't that presume 
the language on the other end is Python? My understanding is that Pyarrow 
should allow reading data into a Pyarrow object on the other side and handle 
the mapping for various supported data types.
   
   Some might suggest using Pyarrow Flight RPC or protobuf, but I believe these 
don't fully solve the use cases because:
   
   -  Flight RPC doesn't address how to serialize non-DataFrame objects, 
leading to compatibility issues.
   - Custom protobuf for every data and collection type adds complexity 
compared to serializing everything to bytes.
   - Serializing to disk can cause file locking and concurrency issues when 
reading or writing constantly.
   
   My use case is that Pyarrow supports many data types, including tensors and 
dictionaries, which is great. However, I can't serialize them as a data 
interchange format unless the other end is Python.
   
   Is there a way for common object types like numpy arrays to zero-memory 
serialize based on their respective Pyarrow objects, or is this not possible?
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to