alamb opened a new pull request #8460: URL: https://github.com/apache/arrow/pull/8460
This is a PR incorporating the feedback from @nevi-me and @jorgecarleitao from https://github.com/apache/arrow/pull/8400 It adds 1. a `can_cast_types` function to the Arrow cast kernel (as suggested by @jorgecarleitao / @nevi-me in https://github.com/apache/arrow/pull/8400#discussion_r501850814) that encodes the valid type casting 2. A test that ensures `can_cast_types` and `cast` remain in sync 3. Bug fixes that the test above uncovered (I'll comment inline) 4. Change DataFuson to use `can_cast_types` so that it plans casting consistently with what arrow allows Previously the notions of coercion and casting were somewhat conflated in DataFusion. I have tried to clarify them in https://github.com/apache/arrow/pull/8399 and this PR. See also https://github.com/apache/arrow/pull/8340#discussion_r501257096 for more discussion. I am adding this functionality so DataFusion gains rudimentary support `DictionaryArray`. Codewise, I am concerned about the duplication in logic between the match statements in `cast` and `can_cast_types. I have some thoughts on how to unify them (see https://github.com/apache/arrow/pull/8400#discussion_r504278902), but I don't have time to implement that as it is a bigger change. I think this approach with some duplication is ok, and the test will ensure they remain in sync. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
