Jefffrey opened a new issue, #3803:
URL: https://github.com/apache/arrow-rs/issues/3803

   **Describe the bug**
   <!--
   A clear and concise description of what the bug is.
   -->
   
   In some cases `regexp_match` will skip first and only match.
   
   e.g. if pattern is `foo` and string to match is `foo` then should return 
single match `foo`. Currently returning empty array for the match (correctly 
finds there is a match, but doesn't return the match correctly).
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   
   Example test in 
[arrow-string/src/regexp.rs](https://github.com/apache/arrow-rs/blob/79518cf67a6dd5fc391e271fd92c0c21ee7e8a74/arrow-string/src/regexp.rs)
   
   ```rust
       #[test]
       fn sandbox() {
           let array = StringArray::from(vec![Some("foo")]);
           let pattern = GenericStringArray::<i32>::from(vec![r"foo"]);
           let actual = regexp_match(&array, &pattern, None).unwrap();
           let result = actual.as_any().downcast_ref::<ListArray>().unwrap();
           let elem_builder: GenericStringBuilder<i32> = 
GenericStringBuilder::new();
           let mut expected_builder = ListBuilder::new(elem_builder);
           expected_builder.values().append_value("foo");
           expected_builder.append(true);
           let expected = expected_builder.finish();
           assert_eq!(&expected, result);
       }
   
   ```
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   
   Test should succeed.
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->
   
   Seems its because by default skipping the first match in a capture group:
   
   
https://github.com/apache/arrow-rs/blob/79518cf67a6dd5fc391e271fd92c0c21ee7e8a74/arrow-string/src/regexp.rs#L210-L218
   
   Where in the test example above, `caps` has value:
   
   ```
   [arrow-string/src/regexp.rs:212] &caps = Captures(
       {
           0: Some(
               "foo",
           ),
       },
   )
   ```
   
   Relevant regex doc: 
https://docs.rs/regex/latest/regex/struct.Regex.html#method.captures
   
   Specifically:
   
   > Capture group `0` always corresponds to the entire match.
   
   Original issue: https://github.com/apache/arrow-datafusion/issues/5479


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to