Re: [PR] chore: Add microbenchmarks [datafusion-comet]

via GitHub Wed, 17 Jul 2024 13:14:25 -0700


parthchandra commented on PR #671:
URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2234180553


   > TPCDS Micro Benchmarks:                   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   > 
------------------------------------------------------------------------------------------------------------------------
   > add_many_decimals                                 20502          20648     
    208         14.0          71.2       1.0X
   > add_many_decimals                                 20498          20544     
     65         14.1          71.2       1.0X
   > add_many_decimals: Comet (Scan)                   28143          28161     
     26         10.2          97.7       0.7X
   > add_many_decimals: Comet (Scan, Exec)             19323          19497     
    246         14.9          67.1       1.1X
   
   Just looking at this one case, with decimal fields and only scan enabled, we 
are much slower. This is consistent with something I saw when working on the 
parallel reader.
   From a profiling run I saw that a potential bottleneck was 
[BosonVector.getDecimal](https://github.com/apache/datafusion-comet/blob/7ac2fb9f6672fbb29c2e5de7e62b457efc0bfebf/common/src/main/java/org/apache/comet/vector/CometVector.java#L74)
 which has an expensive creation of a BigInteger followed by an expensive 
creation of a BigDecimal.
   However, this path would be hit only for precision > 18 or if 
`spark.comet.use.decimal128` was set to `true` (it is `false` by default).
   Also, I'm not sure if there is a way to eliminate this though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

Reply via email to