sanderson opened a new issue, #9472:
URL: https://github.com/apache/arrow-datafusion/issues/9472

   ### Describe the bug
   
   The `substr_index` function only returns the expected response for the first 
row in a table and an incorrect result for all subsequent rows when using a 
negative position for a delimiter. It seems to use the position of the 
specified delimiter in the first row to process all subsequent rows. It also 
includes the delimiter in the returned substring, when I don't believe that it 
should.
   
   ### To Reproduce
   
   Run the following query:
   
   ```sql
   SELECT
     url,
     substr_index(url, '.', 1) AS subdomain,
     substr_index(url, '.', -1) AS tld
   FROM
     (values ('docs.apache.com'),
             ('community.influxdata.com'),
             ('arrow.apache.org')
     ) data(url)
   ```
   
   | url                      | subdomain | tld       |
   |--------------------------|-----------|-----------|
   | docs.influxdata.com      | docs      | .com      |
   | community.influxdata.com | community | xdata.com |
   | arrow.apache.org         | arrow     | e.org     |
   
   ### Expected behavior
   
   I would expect the above query to return the following:
   
   | url                      | subdomain | tld |
   | :----------------------- | :-------- | :-- |
   | docs.influxdata.com      | docs      | com |
   | community.influxdata.com | community | com |
   | arrow.apache.org         | arrow     | org |
   
   ### Additional context
   
   I tested this in InfluxDB v3's implementation of DataFusion, but I believe 
these changes were just pulled from upstream.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to