jayzhan211 commented on issue #13099:
URL: https://github.com/apache/datafusion/issues/13099#issuecomment-2525483850

   > > > @alamb I would really appreciate any advice you could give when you 
have a moment.
   > > 
   > > 
   > > I think we would have to get some detailed profiling to really know for 
sure, but I suspect that ClickBench has non trivial caches (buffer caching, 
page caches, etc)
   > > DataFusion, as a serverless engine, does not have any such caching (the 
only difference between cold/hot run is that on the hot run, data from disk 
will be in the Linux page cache (so may not do any actual IO)
   > > It might also help to break down which queries showed the biggest 
discrepancy -- were they queries that already ran in 100ms (in which case 
caching , avoiding re-reading metadata might be a bigger part of processing)
   > 
   > After conducting more experiments, I made some unexpected discoveries:
   > 
   > In the public clickbench results, Clickhouse was using a version newer 
than 24.11, while our server had 24.1/24.3 installed. Therefore, I re-ran the 
benchmark using the latest version 24.12, and this time, the results were 
similar to those on the clickbench website - Datafusion was faster than 
Clickhouse in both cold run and hot run phases, and these results were 
consistently reproducible. This means that recent updates to Clickhouse have 
led to a decline in its query performance for parquet files. In the earlier 
versions, Clickhouse still had better performance during the hot run phase.
   > 
   > @alamb FYI
   
   Do you know which queries are we still lag behind in the old version of 
clickhouse?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to