parthchandra commented on issue #679:
URL: 
https://github.com/apache/datafusion-comet/issues/679#issuecomment-2249216481

   I see different results in profiling. I ran a simple query - `select 
ss_net_profit from store_sales` for a **100** iterations with `useDecimal128` 
enabled and see the following -
   <img width="1591" alt="Screenshot 2024-07-24 at 5 48 23 PM" 
src="https://github.com/user-attachments/assets/dc6633db-eef6-487c-a8dc-869348ce10e9";>
   
   What stands out is that the bulk of the time is being spent in the 
`comet::parquet::read::values::<impl comet::parquet::read::PlainDecoding for 
comet::parquet::data_type::Int32DecimalType>::decode` 
   Within this method the main time consumers (as a percentage of time spent in 
cpu) are 
     `core::slice::<impl [T]>::fill`   - 16.76%
     `comet::common::bit::memcpy`  - 7.07%
     `core::slice::<impl [T]>::fill` - 5.18% (second code path)
   
   Overall Comet is 0.4x of Spark.
     
     I made a change to `comet::common::bit::memcpy` to use 
`copy_nonoverlapped` which is unsafe and see a 25% improvement.  (After the 
change, Comet is 0.5x of Spark)
     
     However I don't know the best way to avoid the `slice.fill` calls without 
voiding the warranty. I'm looking at 
[MaybeUninit](https://doc.rust-lang.org/std/mem/union.MaybeUninit.html), but 
the documentation quite rightly warns of there being dragons. 
   
    Also, with `useDecimal128` disabled, we are slower than Spark because we 
treat the value a Decimal irrespective of precision. Spark reads and processes 
the value as `Int` 
    A minor change to Comet results in Comet being 1.2x of Spark for this query 
with `useDecimal128` disabled.
    I'll post a PR after some testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to