sanderson opened a new issue, #9472:
URL: https://github.com/apache/arrow-datafusion/issues/9472
### Describe the bug
The `substr_index` function only returns the expected response for the first
row in a table and an incorrect result for all subsequent rows when using a
negative position for a delimiter. It seems to use the position of the
specified delimiter in the first row to process all subsequent rows. It also
includes the delimiter in the returned substring, when I don't believe that it
should.
### To Reproduce
Run the following query:
```sql
SELECT
url,
substr_index(url, '.', 1) AS subdomain,
substr_index(url, '.', -1) AS tld
FROM
(values ('docs.apache.com'),
('community.influxdata.com'),
('arrow.apache.org')
) data(url)
```
| url | subdomain | tld |
|--------------------------|-----------|-----------|
| docs.influxdata.com | docs | .com |
| community.influxdata.com | community | xdata.com |
| arrow.apache.org | arrow | e.org |
### Expected behavior
I would expect the above query to return the following:
| url | subdomain | tld |
| :----------------------- | :-------- | :-- |
| docs.influxdata.com | docs | com |
| community.influxdata.com | community | com |
| arrow.apache.org | arrow | org |
### Additional context
I tested this in InfluxDB v3's implementation of DataFusion, but I believe
these changes were just pulled from upstream.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]