adrienchaton commented on issue #14229: URL: https://github.com/apache/arrow/issues/14229#issuecomment-1261921991
the column dtypes are Int64(1), UInt16(15), UInt64(1), bool(11), boolean(6), object(15) the object columns are strings what I meant by bounded index is that if I print the dataframe index before storing it to parquet, I get no overflow error (just a proper integer range) however if I try to load this dataframe from the parquet file, I get an overflow error I believe you saying that the number of columns shouldn't matter but somehow it is strange that I do not get an overflow error if I only load a single column subset of the dataframe ... in the meantime its not that bad to use the trick of chunking the dataframe into several .parquet files and concatenate them back when loading ... just bugging me (and I used to store hundreds of millions rows sized dataframes into single parquet files without this error popping) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
