[GitHub] [arrow] jorisvandenbossche commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

GitBox Sun, 12 Jul 2020 00:34:14 -0700


jorisvandenbossche commented on pull request #7545:
URL: https://github.com/apache/arrow/pull/7545#issuecomment-657186524



   Rebased. 
   
   This depends on https://github.com/apache/arrow/pull/7704 (ARROW-9297) for 
fixing the large_memory failure noted above 
(https://github.com/apache/arrow/pull/7545#issuecomment-649631989).
   
   In addition, we should probably also decide on whether we want to use 
dictionary type for the (string) partition fields or not. Right now we do 
(actually not only for strings, but also for integers). But the default with 
the datasets API is to use the plain (string or int) type. But we can specify 
an option to keep the existing behaviour for `parquet.read_table` (although 
that also creates an inconsistency between `pyarrow.datasets` and 
`pyarrow.parquet` using datasets).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on pull request #7545: ARROW-9139: [Python] Switch parquet.read_table to use new datasets API by default

Reply via email to