[GitHub] [arrow] seddonm1 commented on pull request #9428: ARROW-10354: [Rust][DataFusion] regexp_extract function to select regex groups from strings

GitBox Mon, 08 Mar 2021 13:01:35 -0800


seddonm1 commented on pull request #9428:
URL: https://github.com/apache/arrow/pull/9428#issuecomment-793074013



   Hi @sweb 
   
   Yesterday I made my PR for `regexp_replace` (and others) see 
https://github.com/apache/arrow/pull/9654/files#diff-c122a83600dc86aa69067fdfdca1e0349616dfae73eaad8a8c90d5e69dbf7a3c
 You can see how I am passing in the values. I think there is opportunity to 
use more `lazy_static` to precompile any standard regex in that file to reduce 
runtime cost - but we need to see all the functions before too much 
optimisation.
   
   The way I have done the `regexp_replace` code (and I am not saying it is the 
best way) is that because potentially each row can be different (as any 
argument to a function in Postgres can actually be supplied by referencing 
column) I have tried to balance that cost by memoizing the Regex objects. I did 
write a lot of this code prior to @jorgecarleitao doing some large changes in 
the `functions.rs` relating to Scalar vs Columnar so there may be a second pass 
to optimise this once we get basic functionality working.
   
   I think the `regexp_match` is the way to go (as per Postgres) which does 
return a list of string values. We will then need to look at the sqlparser to 
add the ability to 'extract' values from the list by id: `[0]` (I was thinking 
of doing this soon). I need to do some playing in Postgres to fully understand 
the behaivor (what happens if you reference a non-existent index).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] seddonm1 commented on pull request #9428: ARROW-10354: [Rust][DataFusion] regexp_extract function to select regex groups from strings

Reply via email to