andygrove opened a new issue, #1305:
URL: https://github.com/apache/datafusion-comet/issues/1305

   # Introduction
   
   Inspired by @alamb's weekly updates in DataFusion, I thought it would be a 
good idea to do something similar in Comet to keep contributors updated on what 
is happening in the project. These notes reflect things I am personally 
involved in or thinking about and may not cover all activities. Feel free to 
add comments for anything that I missed.
   
   # News
   
   Comet 0.5.0 has been released. It shows a 1.9x speedup for single node TPC-H 
@ 100GB, up from 1.7x in the previous release. Thank you to everyone who 
contributed. 
   
   Blog post: 
https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0/
   
   # comet-parquet-exec
   
   @mbutrovich and @parthchandra have been working in the 
[comet-parquet-exec](https://github.com/apache/datafusion-comet/tree/comet-parquet-exec)
 branch on adding support for using DataFusion's ParquetExec as an alternative 
to Comet's current native Parquet reader. This has the advantage of supporting 
reading complex types from Parquet and may also provide some performance 
improvements, although we won't know for sure until it is fully implemented. We 
are now at a point where we would like to merge this work into main and are 
working on fixing some test regressions so that we can do that. I'm hoping that 
we can get this merged in the next week.
   
   # Array support
   
   There are multiple community PRs either in draft or in review that add 
additional array functions. The eipc for tracking this effort is 
https://github.com/apache/datafusion-comet/issues/1042. There is also a PR to 
add array data generation to the fuzz testing tool: 
https://github.com/apache/datafusion-comet/pull/1292 to help find edge cases 
that are not currently handled, although none have been found so far.
   
   # Quality
   
   It is really important to make sure that Comet produces the same results as 
Spark. We currently rely on Spark's tests as well as additional 
unit/integration tests in Comet but it is difficult to cover every possible 
edge case when adding new expressions. I am starting to think about how we can 
improve and simplify our testing efforts to increase test coverage. Comet has a 
fuzz testing tool that has been helpful but it is not very sophisticated yet, 
and we only run it occasionally. I plan on experimenting with some automated 
fuzz testing that runs as part of the integration test suite.
   
   ## Community 
   
   * [Weekly 
Call](https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit#heading=h.kpjkpncdmt1g)
   * Slack/Discord: [info 
links](https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord)
 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to