Hi Yevgeni,

I don't think we have ever promised binary stability of the
pyarrow.serialize() protocol.  Binary compatibility starting from 1.0.0
is about the Arrow in-memory format and the Arrow IPC format (i.e. how
Arrow arrays, tables... are laid out and how their metadata is encoded
on the wire).

So I would not recommend using pa.serialize() for storage.  If you want
to store data, you should use a well-known file format (or a combination
thereof), such as Parquet.

Regards

Antoine.


Le 23/08/2019 à 07:25, Yevgeni Litvin a écrit :
> In our system we are using arrow serialization as it showed excellent
> deserialization speed. However, seems that we made a mistake by persisting
> the streams into a long-term storage as the serialized data appears to be
> incompatible between versions. According to the release notes of 0.14.0 it
> appears that starting 1.0.0 binary compatibility will be maintained. My
> question is whether pyarrow.serialize is also guaranteed to maintain binary
> compatibility starting with arrow 1.0 and it would be safe to persist its
> output then (or maybe even starting now - 0.14)?
> 
> (from my quick test the 0.13 is not compatible with 0.12 and before, while
> it is compatible to 0.14)
> 
> Thank you,
> 
> - Yevgeni
> 

Reply via email to