Kimahriman commented on issue #841: URL: https://github.com/apache/datafusion-comet/issues/841#issuecomment-2299317899
> I belive this would be resolved by forwarding the `dictionaryProvider` into the `CometListVector` similarly to what was done with the `CometMapVector` and `CometStructVector` in this PR: https://github.com/apache/datafusion-comet/pull/789/files I figured it would be simple, I can look into at least fixing that. > To me this seems like a bug in the upstream datafusion implementation. Would it not be better to address the bug there? e.g make that implementation have correct behavior around dictionary encoded types. I agree this is mostly a DataFusion bug, but it's at least partially a comet thing for choosing to make the dictionaries in the first place. Mostly I guess I'm asking if it's worth coming up with a workaround for this specific issue or more generally any dictionary related issue that will let things work until DataFusion handles things correctly, which might be non-trivial to fix. > Seems like that would cause an unnecessary copy in the case of the `make_array`. To me it seems like we should just be "unpacking the dictionary as part of the data getting writing into the new array data structure. And other expressions might be able to do other optimizations by having data in the dictionary format for example #504 I agree that is the best case scenario. Since doing the unpacking as part of creating the array data structure might be a non-trivial fix (at least for someone like me who's just learning about DataFusion), is it worth making a way to opt-in to a workaround that will pre-unravel a dictionary before sending it into a DataFusion function that Comet expressions can opt-in to until a better more permanent fix is figured out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org