Kimahriman commented on issue #841:
URL: 
https://github.com/apache/datafusion-comet/issues/841#issuecomment-2299317899

   > I belive this would be resolved by forwarding the `dictionaryProvider` 
into the `CometListVector` similarly to what was done with the `CometMapVector` 
and `CometStructVector` in this PR: 
https://github.com/apache/datafusion-comet/pull/789/files
   
   I figured it would be simple, I can look into at least fixing that.
   
   > To me this seems like a bug in the upstream datafusion implementation. 
Would it not be better to address the bug there? e.g make that implementation 
have correct behavior around dictionary encoded types.
   
   I agree this is mostly a DataFusion bug, but it's at least partially a comet 
thing for choosing to make the dictionaries in the first place. Mostly I guess 
I'm asking if it's worth coming up with a workaround for this specific issue or 
more generally any dictionary related issue that will let things work until 
DataFusion handles things correctly, which might be non-trivial to fix.
   
   > Seems like that would cause an unnecessary copy in the case of the 
`make_array`. To me it seems like we should just be "unpacking the dictionary 
as part of the data getting writing into the new array data structure. And 
other expressions might be able to do other optimizations by having data in the 
dictionary format for example #504
   
   I agree that is the best case scenario. Since doing the unpacking as part of 
creating the array data structure might be a non-trivial fix (at least for 
someone like me who's just learning about DataFusion), is it worth making a way 
to opt-in to a workaround that will pre-unravel a dictionary before sending it 
into a DataFusion function that Comet expressions can opt-in to until a better 
more permanent fix is figured out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to