sweb commented on pull request #9428:
URL: https://github.com/apache/arrow/pull/9428#issuecomment-792942581


   > @sweb I can help on Monday. I'm planning to raise the PR for those other 
regexp functions then can help work through this?
   
   Hey @seddonm1 I have rebased my PR on the current master. My plan would be 
to only keep `regexp_match` and remove `regexp_extract`. However, since there 
are review comments that I did not address yet, I did not want to remove 
`regexp_extract` before I was sure that `regexp_match` is the way to go.
   
   My main issues are:
   
   * What is the correct way to pass the regular expression into the DataFusion 
function? I have seen in your PR that you treat it as a normal column 
(ArrayRef) and compile all distinct regular expressions and apply the relevant 
one. I have to admit that I did not think of this use case, but of course it is 
much cleaner than my assumption that it is going to be a literal, fingers 
crossed. However, I am not sure that I would expect this API from a kernel 
function, i.e. passing the regex as a `StringArray`, instead of a `&str`.
   * What do you think concerning the usage of `ListArray` in `regexp_match` 
and just defining the return types of the corresponding DataFusion functions as 
`List`? I have no experience in using DataFusion (yet) so I am not sure whether 
this makes sense or if I have to add some kind of handling for accessing list 
elements to make this usable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to