Shekharrajak opened a new pull request, #2991: URL: https://github.com/apache/datafusion-comet/pull/2991
## Which issue does this PR close? Closes #2972. ## Rationale for this change The contains expression shows poor performance in Comet (0.2X vs Spark) because DataFusion's make_scalar_function wrapper expands scalar patterns to arrays, bypassing arrow-rs's optimized scalar path. ## What changes are included in this PR? * Add SparkContains UDF with optimized scalar pattern handling using memchr::memmem::Finder for SIMD-accelerated substring search * Register the function in comet_scalar_funcs.rs to override DataFusion's built-in contains * Add contains to CometStringExpressionBenchmark * Enhance contains test in CometExpressionSuite ## How are these changes tested? * 4 new unit tests in contains.rs (array-scalar, scalar-scalar, null handling, empty pattern) * Enhanced integration test in CometExpressionSuite.scala * All 122 spark-expr tests pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
