tanruixiang commented on issue #6784:
URL: 
https://github.com/apache/arrow-datafusion/issues/6784#issuecomment-1627577672

   > I wonder if you can explain how you are using `dict_id` in Schema? That 
field in particular is quite old in the code, and @tustvold and I have 
discussed at various times removing it as it doesn't seem to be widely used / 
supported. However we couldn't convince ourselves that was a good idea -- 
mostly because we don't know what people are using it for
   > 
   > > By the way, if you think my draft implementation works, I'd be happy to 
continue implementing it and contributing to the community.
   > 
   > Thank you for your efforts so far -- I did look (briefly) at the code and 
while it follows the existing patterns for metadata / nullable it felt to me 
like it was going to be hard to ensure all cases were covered properly (aka it 
was going to be hard to use)
   
   Thank you for your reply. I got what your mean. Currently `dict_id` is only 
used for IPC, my point is that we can further use on `dict_id` and 
`dict_is_ordered`, for the some operations, such as `Eq`, if `left` and `right` 
have the same `dict_id` ,it can only need to compare whether the Key is the 
same(In case of string type, the complexity changes: 
`O(max(leftstring_ength,rightstring_length) -> O(1)`). For sorting operations, 
further speedups can also be done based on `dict_is_ordered`. 
   Of course, this is a relatively big feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to