jorisvandenbossche commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-657186524
Rebased. This depends on https://github.com/apache/arrow/pull/7704 (ARROW-9297) for fixing the large_memory failure noted above (https://github.com/apache/arrow/pull/7545#issuecomment-649631989). In addition, we should probably also decide on whether we want to use dictionary type for the (string) partition fields or not. Right now we do (actually not only for strings, but also for integers). But the default with the datasets API is to use the plain (string or int) type. But we can specify an option to keep the existing behaviour for `parquet.read_table` (although that also creates an inconsistency between `pyarrow.datasets` and `pyarrow.parquet` using datasets). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org