parthchandra commented on issue #679: URL: https://github.com/apache/datafusion-comet/issues/679#issuecomment-2249216481
I see different results in profiling. I ran a simple query - `select ss_net_profit from store_sales` for a **100** iterations with `useDecimal128` enabled and see the following - <img width="1591" alt="Screenshot 2024-07-24 at 5 48 23 PM" src="https://github.com/user-attachments/assets/dc6633db-eef6-487c-a8dc-869348ce10e9"> What stands out is that the bulk of the time is being spent in the `comet::parquet::read::values::<impl comet::parquet::read::PlainDecoding for comet::parquet::data_type::Int32DecimalType>::decode` Within this method the main time consumers (as a percentage of time spent in cpu) are `core::slice::<impl [T]>::fill` - 16.76% `comet::common::bit::memcpy` - 7.07% `core::slice::<impl [T]>::fill` - 5.18% (second code path) Overall Comet is 0.4x of Spark. I made a change to `comet::common::bit::memcpy` to use `copy_nonoverlapped` which is unsafe and see a 25% improvement. (After the change, Comet is 0.5x of Spark) However I don't know the best way to avoid the `slice.fill` calls without voiding the warranty. I'm looking at [MaybeUninit](https://doc.rust-lang.org/std/mem/union.MaybeUninit.html), but the documentation quite rightly warns of there being dragons. Also, with `useDecimal128` disabled, we are slower than Spark because we treat the value a Decimal irrespective of precision. Spark reads and processes the value as `Int` A minor change to Comet results in Comet being 1.2x of Spark for this query with `useDecimal128` disabled. I'll post a PR after some testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org