mbutrovich commented on PR #17195: URL: https://github.com/apache/datafusion/pull/17195#issuecomment-3271252630
> Can you explain what is happening in comet? For example, is regexp_replace being called with `regexp_replace(UTF8View, Utf8View, Utf8View)` In my experimental branch adding `StringView` support to Comet, we need a way to represent string literals during serialization from the Spark side to DataFusion. Currently all string literals come over as `Utf8` and that just works. However, with `Utf8View` columns coming out of the Parquet reader, Arrow complains about not being able to evaluate filter expressions with mismatched types. I changed all string literals to be `Utf8View`, which underneath doesn't really change anything underneath for single `ScalarValue`s. Now, however, I have problems with functions like `regexp_replace` which expect literals to be `Utf8`. Since Comet does not use DataFusion's front-end, we don't get the cast operations inserted into the plan that the signature logic is designed for. I am increasingly of the mind that Comet needs to start doing some passes over the physical plan, and type coercion like this might be one reason. I think this PR is good to go, but also am okay if we think it's needless complexity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
