[GitHub] [arrow] adrienchaton commented on issue #14229: OSError: List index overflow.

GitBox Thu, 29 Sep 2022 01:11:19 -0700


adrienchaton commented on issue #14229:
URL: https://github.com/apache/arrow/issues/14229#issuecomment-1261921991


   the column dtypes are
   Int64(1), UInt16(15), UInt64(1), bool(11), boolean(6), object(15)
   the object columns are strings
   
   what I meant by bounded index is that if I print the dataframe index before 
storing it to parquet, I get no overflow error (just a proper integer range)
   however if I try to load this dataframe from the parquet file, I get an 
overflow error
   
   I believe you saying that the number of columns shouldn't matter but somehow 
it is strange that I do not get an overflow error if I only load a single 
column subset of the dataframe ...
   
   in the meantime its not that bad to use the trick of chunking the dataframe 
into several .parquet files and concatenate them back when loading ... just 
bugging me (and I used to store hundreds of millions rows sized dataframes into 
single parquet files without this error popping)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] adrienchaton commented on issue #14229: OSError: List index overflow.

Reply via email to