pbower commented on issue #11239: URL: https://github.com/apache/arrow/issues/11239#issuecomment-2122356170
Hi, I have a general question regarding Pyarrow, and this seems to be the best place to discuss it. Isn't the whole point of Pyarrow to be a leading data interchange format? If we're picking objects and sending them over the network, doesn't that presume the language on the other end is Python? My understanding is that Pyarrow should allow reading data into a Pyarrow object on the other side and handle the mapping for various supported data types. Some might suggest using Pyarrow Flight RPC or protobuf, but I believe these don't fully solve the use cases because: - Flight RPC doesn't address how to serialize non-DataFrame objects, leading to compatibility issues. - Custom protobuf for every data and collection type adds complexity compared to serializing everything to bytes. - Serializing to disk can cause file locking and concurrency issues when reading or writing constantly. My use case is that Pyarrow supports many data types, including tensors and dictionaries, which is great. However, I can't serialize them as a data interchange format unless the other end is Python. Is there a way for common object types like numpy arrays to zero-memory serialize based on their respective Pyarrow objects, or is this not possible? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
