[PR] Add regexp extract function [datafusion]

via GitHub Sat, 18 Oct 2025 02:34:14 -0700


Noughmad opened a new pull request, #17832:
URL: https://github.com/apache/datafusion/pull/17832


   Add a new function, `regexp_extract`, that works like the function of the 
same name in Spark. See 
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.regexp_extract.html
 . 
   
   ## Which issue does this PR close?
   
   This is a test assignment to demonstrate my proficiency with Rust, as well 
as how fast I could learn to work with a new codebase like DataFusion. 
   
   ## Rationale for this change
   
   The new function expands the functionality of DataFusion. 
   
   The code for the new function is largely copied from existing regex-related 
functions, namely `regexp_count` and `regexp_replace`. The `_count` function 
has more similar arguments, while the `_replace` has a more similar return 
value, so I combined the two. Copying from `regexp_match` and `regexp_like` was 
not useful because they are implemented in `arrow`, which does not have an 
equivalent `regexp_extract`. 
   
   Only the main functionality is really newly written, but that is a 
single-line function `extract_match_inner`. I also added another utility 
function to consolidate handling of different array types. 
   
   ## What changes are included in this PR?
   
   * A new scalar function called `regexp_extract`.
   
   ## Are these changes tested?
   
   * Yes, by single test case. Will definitely be expanded before submission. 
   
   ## Are there any user-facing changes?
   
   A new user-facing function has been added. It is documented, but I assume it 
also needs to be added to the changelog and user guide.. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Add regexp extract function [datafusion]

Reply via email to