tanruixiang commented on issue #6784: URL: https://github.com/apache/arrow-datafusion/issues/6784#issuecomment-1627577672
> I wonder if you can explain how you are using `dict_id` in Schema? That field in particular is quite old in the code, and @tustvold and I have discussed at various times removing it as it doesn't seem to be widely used / supported. However we couldn't convince ourselves that was a good idea -- mostly because we don't know what people are using it for > > > By the way, if you think my draft implementation works, I'd be happy to continue implementing it and contributing to the community. > > Thank you for your efforts so far -- I did look (briefly) at the code and while it follows the existing patterns for metadata / nullable it felt to me like it was going to be hard to ensure all cases were covered properly (aka it was going to be hard to use) Thank you for your reply. I got what your mean. Currently `dict_id` is only used for IPC, my point is that we can further use on `dict_id` and `dict_is_ordered`, for the some operations, such as `Eq`, if `left` and `right` have the same `dict_id` ,it can only need to compare whether the Key is the same(In case of string type, the complexity changes: `O(max(leftstring_ength,rightstring_length) -> O(1)`). For sorting operations, further speedups can also be done based on `dict_is_ordered`. Of course, this is a relatively big feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
