[GitHub] [arrow-datafusion] rdettai commented on pull request #1860: Increase default partition column type from Dict(UInt8) to Dict(UInt16)

GitBox Wed, 02 Mar 2022 02:29:16 -0800


rdettai commented on pull request #1860:
URL: 
https://github.com/apache/arrow-datafusion/pull/1860#issuecomment-1056764995



   The idea behind using UInt8 is that the values of a given partition column 
within a file will be all identical. If I have to materialize a large array 
with only zeros, I would rather not encode each 0 on 64 bits 😄. To actually 
have a record batch with multiple partition values, you would need to go 
through something like the `concat` kernel first. Wouldn't it make sense to 
rely on that kernel to re-cast the index type appropriately? I think that it 
would be a safer approach in general to avoid overflowing when merging 
dictionaries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] rdettai commented on pull request #1860: Increase default partition column type from Dict(UInt8) to Dict(UInt16)

Reply via email to