jorisvandenbossche commented on pull request #7545:
URL: https://github.com/apache/arrow/pull/7545#issuecomment-658662358


   To clarify:
   
   - The current PR right now doesn't use dictionary encoding for any type of 
partition fields, so also not for strings
   - For strings I could rather easily add it (it's an option in the datasets 
API that can be set)
   - For ints it's not actually possible, as long as the datasets API doesn't 
support it (dictionary encoding the ints after reading is possible, but won't 
necessarily give you all unique values in the dictionary if you applied a 
filter)
   
   I will at least quickly experiment with enabling the dictionary encoding, or 
providing an option for it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to