jorisvandenbossche commented on issue #14229: URL: https://github.com/apache/arrow/issues/14229#issuecomment-1282287198
> Another thing that could help is if we could break the data-frame across row-groups (I'm surprised for this size dataframe we don't have defaults in place that would already do that). I don't fully understand how the dataframe is involved here. If I read the above correctly, it is the reading of a Parquet file into an Arrow table that is failing? (and not the conversion of the dataframe -> pyarrow table (for writing), or after reading the conversion of pyarrow table -> dataframe) When converting a large dataframe like this, I think we automatically use chunked array to be able to represent this in a ListType? But when reading from Parquet, I would assume we also use chunks per record batch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
