jorisvandenbossche commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-658662358
To clarify: - The current PR right now doesn't use dictionary encoding for any type of partition fields, so also not for strings - For strings I could rather easily add it (it's an option in the datasets API that can be set) - For ints it's not actually possible, as long as the datasets API doesn't support it (dictionary encoding the ints after reading is possible, but won't necessarily give you all unique values in the dictionary if you applied a filter) I will at least quickly experiment with enabling the dictionary encoding, or providing an option for it. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
