nealrichardson commented on pull request #7545: URL: https://github.com/apache/arrow/pull/7545#issuecomment-658507782
> I think the rationale is that the memory and performance savings related to materializing the partition columns are mos significant with string data. So it's definitely beneficial to return them as dictionary types. Right, my understanding from Joris's last comment was that this was already converting strings to dictionaries, which seems like a reasonable (though not mandatory) choice, and that the hangup was whether it was essential to also do that for ints. I guess the other workaround if people aren't happy with the choice here is to set `use_legacy_dataset = True`, so I agree that it's not the end of the world if the choice we make about dictionaries today turns out not to be optimal. But we should merge this so that the default is to use the datasets API so that we can learn where exactly we were mistaken. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org