tanruixiang commented on issue #6784: URL: https://github.com/apache/arrow-datafusion/issues/6784#issuecomment-1616345905
> I would love to know what you think about dict_id handling in general -- from what I can see so far it is not well supported in arrow-rs. We have similar problems with `metadata` which can be hung off a schema or a field and gets lost frequently > > I am also not 100% clear if dict_id is supposed to (potentially) different per record batch or if it would be the same for the entire plan > > One thing that might be possible is to compare the pointer for the dictionary array to decide if it was the same dictionary rather than trying to keep `dict_id` all the way through the plan 🤔 Thank you very much for your reply. I think that after providing a reasonable `provider`, `dict_id` should not be lost when building the plan, because `dict_id` and other variables such as `name` are located at the same level(In `pub struct Field`), and since `name` will not be lost, `dict_id` should not be lost either. At the same time we get the `name`, we can get the `dict_id`, so we need to keep the `dict_id` at least for the whole process of building the plan. (In particular, I think it's always better to keep the `dict_id` as much as possible than to just discard it and use the `default value`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
