[GitHub] [arrow-datafusion] tanruixiang commented on issue #6784: The `dict_id` was lost when constructing the logic plan.

via GitHub Sat, 01 Jul 2023 21:02:20 -0700


tanruixiang commented on issue #6784:
URL: 
https://github.com/apache/arrow-datafusion/issues/6784#issuecomment-1616345905


   > I would love to know what you think about dict_id handling in general -- 
from what I can see so far it is not well supported in arrow-rs. We have 
similar problems with `metadata` which can be hung off a schema or a field and 
gets lost frequently
   > 
   > I am also not 100% clear if dict_id is supposed to (potentially) different 
per record batch or if it would be the same for the entire plan
   > 
   > One thing that might be possible is to compare the pointer for the 
dictionary array to decide if it was the same dictionary rather than trying to 
keep `dict_id` all the way through the plan 🤔
   
   Thank you very much for your reply. I think that after providing a 
reasonable `provider`, `dict_id` should not be lost when building the plan, 
because `dict_id` and other variables such as `name` are located at the same 
level(In `pub struct Field`), and since `name` will not be lost, `dict_id` 
should not be lost either.
   At the same time we get the `name`, we can get the `dict_id`, so we need to 
keep the `dict_id` at least for the whole process of building the plan. (In 
particular, I think it's always better to keep the `dict_id` as much as 
possible than to just discard it and use the `default value`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] tanruixiang commented on issue #6784: The `dict_id` was lost when constructing the logic plan.

Reply via email to