Omega359 opened a new issue, #9026:
URL: https://github.com/apache/arrow-datafusion/issues/9026
### Describe the bug
the instr function doesn't correctly return the position of the matched
string, rather it returns the position of the matched byte index. The issue is
in this code in string_expressions.rs
```
.map(|(string, substr)| match (string, substr) {
(Some(string), Some(substr)) => string
.find(substr)
.map_or(Some(0), |index| Some((index + 1) as i32)),
_ => None,
})
```
See the documentation for the find function.
The expected result from the end users perspective is not the byte index but
rather the index of the first matching unicode grapheme in the string.
### To Reproduce
❯ docker run -it -v /tmp:/data datafusion-cli
DataFusion CLI v35.0.0
❯ select instr('实现这个函数', '函')
;
+----------------------------------------+
| instr(Utf8("实现这个函数"),Utf8("函")) |
+----------------------------------------+
| 13 |
+----------------------------------------+
1 row in set. Query took 0.002 seconds.
5 should be the correct position.
### Expected behavior
The returned position should be the position of the first matching unicode
grapheme in the string, not the first matching byte index.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]