Hi,

I'm using pyarrow 0.10

I have a dataframe about 90GB size in memory, with one object column contain 
strings up to 27 characters max.  

basket_plateau.to_parquet("basket_plateau.parquet", compression=None) writes 
this file to disk just fine
basket_plateau = pd.read_parquet("basket_plateau.parquet") fails however.

ArrowIOError: Arrow error: Capacity error: BinaryArray cannot contain more than 
2147483646 bytes, have 2147483655

I can reproduce this exact same error when I use pyarrow directly:
pq.write_table(pa.Table.from_pandas(basket_plateau), "basket_plateau.parquet")
basket_plateau= pq.read_table("basket_plateau.parquet")

Kr.
Fred

[ Full content available at: https://github.com/apache/arrow/issues/2485 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to