Hi,
I'm using pyarrow 0.10
I have a dataframe about 90GB size in memory, with one object column contain
strings up to 27 characters max.
basket_plateau.to_parquet("basket_plateau.parquet", compression=None) writes
this file to disk just fine
basket_plateau = pd.read_parquet("basket_plateau.parquet") fails however.
ArrowIOError: Arrow error: Capacity error: BinaryArray cannot contain more than
2147483646 bytes, have 2147483655
I can reproduce this exact same error when I use pyarrow directly:
pq.write_table(pa.Table.from_pandas(basket_plateau), "basket_plateau.parquet")
basket_plateau= pq.read_table("basket_plateau.parquet")
Kr.
Fred
[ Full content available at: https://github.com/apache/arrow/issues/2485 ]
This message was relayed via gitbox.apache.org for [email protected]