[GitHub] [arrow] jorisvandenbossche commented on issue #14229: OSError: List index overflow.

GitBox Tue, 18 Oct 2022 05:12:42 -0700


jorisvandenbossche commented on issue #14229:
URL: https://github.com/apache/arrow/issues/14229#issuecomment-1282287198


   > Another thing that could help is if we could break the data-frame across 
row-groups (I'm surprised for this size dataframe we don't have defaults in 
place that would already do that).
   
   I don't fully understand how the dataframe is involved here. If I read the 
above correctly, it is the reading of a Parquet file into an Arrow table that 
is failing? (and not the conversion of the dataframe -> pyarrow table (for 
writing), or after reading the conversion of pyarrow table -> dataframe)
   
   When converting a large dataframe like this, I think we automatically use 
chunked array to be able to represent this in a ListType? But when reading from 
Parquet, I would assume we also use chunks per record batch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on issue #14229: OSError: List index overflow.

Reply via email to