alamb commented on PR #11802:
URL: https://github.com/apache/datafusion/pull/11802#issuecomment-2271894409

   > @alamb finally I think I got the reason, it seems not the measurement 
noise for the long queries(such as q22 in clickbench)...
   > 
   > The introduction for the `Arc<Statistic>` to `PartitionedFile` maybe 
actually make the long queries slower. The detail can see above, although the 
use of `Arc` can decrease the `instructions`, but it increase `bus-cycles`, and 
finally leads to the `higher cycles`(slower).
   > 
   > I guess it is related to the `atomic` in the `Arc`, and when the amount of 
`PartitionedFile` becomes large, the cost of `atomic` becomes not trivial. But 
I am not sure, just a guess.
   > 
   > I eliminate the `Arc<Statistic>` in `PartitionedFile` now for not hurting 
the long queries.
   > 
   > The new benchmarks can see following.
   
   I find it very strange that `Arc` in statistics should show up at all in the 
execution times -- I would expect a query that takes seconds to run would not 
look at the statistics once the query started  and I would expect the actual 
processing time to dominate 🤔 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to