samuelcolvin opened a new issue, #12062:
URL: https://github.com/apache/datafusion/issues/12062

   ### Describe the bug
   
   The performance of `array_has` seems to be pretty poor due to `RowConverter`.
   
   I compared running `array_has` queries vs `json_contain` (not a particularly 
good comparison, but that's not the point here), I'd expect `array_has` to be 
somewhat faster but it's actually 3200x slower:
   
   ```
   +----------+
   | count(*) |
   +----------+
   | 4828     |
   +----------+
   mode: SELECT count(*) FROM test where json_contains(json, 'service.name'), 
query took 31.696875ms
   +----------+
   | count(*) |
   +----------+
   | 4828     |
   +----------+
   mode: SELECT count(*) FROM test where array_has(list, 'service.name'), query 
took 102.430949125s
   ```
   
   Code for this example is 
[here](https://github.com/samuelcolvin/array-has-slow), and 
[here](https://profiler.firefox.com/from-url/http%3A%2F%2F127.0.0.1%3A3000%2F0f5cv741l13l1nnvrxbpgiq0ilqjcxf8gapkc1z%2Fprofile.json/flame-graph/?globalTrackOrder=0&hiddenLocalTracksByPid=36948-0w79wd&symbolServer=http%3A%2F%2F127.0.0.1%3A3000%2F0f5cv741l13l1nnvrxbpgiq0ilqjcxf8gapkc1z&thread=9&v=10)
 is a flame graph from `samply`, you can see that 99% of time is in 
`RowConverter`:
   
   <img width="1727" alt="image" 
src="https://github.com/user-attachments/assets/1cb8cc42-5b33-4675-be4e-969841360a2c";>
   
   
   ### To Reproduce
   
   Clone https://github.com/samuelcolvin/array-has-slow and run `cargo run 
--release`.
   
   ### Expected behavior
   
   `array_has` should be much faster.
   
   Most of the problematic behaviour is in `RowConverter`, but I also think it 
should be much faster by making `general_array_has_dispatch` special cased or 
generic around `ComparisonType` rather than branching in the hot loop.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to