[GitHub] [arrow] emkornfield commented on issue #14229: OSError: List index overflow.

GitBox Tue, 18 Oct 2022 10:17:31 -0700


emkornfield commented on issue #14229:
URL: https://github.com/apache/arrow/issues/14229#issuecomment-1282747916


   > I don't fully understand how the dataframe is involved here. If I read the 
above correctly, it is the reading of a Parquet file into an Arrow table that 
is failing? (and not the conversion of the dataframe -> pyarrow table (for 
writing), or after reading the conversion of pyarrow table -> dataframe)
   
   This is my understanding as well.
   
   > When converting a large dataframe like this, I think we automatically use 
chunked array to be able to represent this in a ListType? But when reading from 
Parquet, I would assume we also use chunks per record batch?
   
   Yes, I wasn't thinking clearly.  One possible conclusion is we aren't do 
chunking when reading from parquet->arrow->pandas?  Is that possible?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] emkornfield commented on issue #14229: OSError: List index overflow.

Reply via email to