Noughmad opened a new pull request, #17832: URL: https://github.com/apache/datafusion/pull/17832
Add a new function, `regexp_extract`, that works like the function of the same name in Spark. See https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.regexp_extract.html . ## Which issue does this PR close? This is a test assignment to demonstrate my proficiency with Rust, as well as how fast I could learn to work with a new codebase like DataFusion. ## Rationale for this change The new function expands the functionality of DataFusion. The code for the new function is largely copied from existing regex-related functions, namely `regexp_count` and `regexp_replace`. The `_count` function has more similar arguments, while the `_replace` has a more similar return value, so I combined the two. Copying from `regexp_match` and `regexp_like` was not useful because they are implemented in `arrow`, which does not have an equivalent `regexp_extract`. Only the main functionality is really newly written, but that is a single-line function `extract_match_inner`. I also added another utility function to consolidate handling of different array types. ## What changes are included in this PR? * A new scalar function called `regexp_extract`. ## Are these changes tested? * Yes, by single test case. Will definitely be expanded before submission. ## Are there any user-facing changes? A new user-facing function has been added. It is documented, but I assume it also needs to be added to the changelog and user guide.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
