Re: No replacement dictionaries supported in pyarrow?

Antoine Pitrou Fri, 19 Mar 2021 11:57:48 -0700

One more general question is whether the file format is reallybeneficial over the stream format in practice. I understand thetheoretical argument for direct access to specific batches, but arethere situations where it really matters? Intuitively, it seems to methat if your data is really large, you may be better off with a morespace-optimized format such as Parquet.



Le 19/03/2021 à 19:49, Wes McKinney a écrit :

Okay, let’s open an issue then to address that at some point. What I recall
from our last discussion was that the dictionaries would be “processed”
when beginning to read the file, appending all the deltas to yield one set
of dictionaries for reassembly. The downside is that the “partial
dictionaries” that existed at the time that the file was written are not
recoverable, but that seems like an acceptable compromise.

On Fri, Mar 19, 2021 at 10:34 AM Antoine Pitrou <anto...@python.org> wrote:


Le 19/03/2021 à 13:37, Wes McKinney a écrit :

I am also under the impression that the file format is supposed to

support

deltas, but not replacements. Is this not implemented in C++?


Definitely not.  Also I was not aware that the file format was supposed
to support deltas.

Regards

Antoine.

Re: No replacement dictionaries supported in pyarrow?

Reply via email to