alamb commented on issue #20874: URL: https://github.com/apache/datafusion/issues/20874#issuecomment-4673035612
Here is the report that was submitted ``` ## Description: The mission of Apache DataFusion is the creation and maintenance of software related to an extensible query engine ## Project Status: Current project status: New + Ongoing (high activity) Issues for the board: None ## Membership Data: Apache DataFusion was founded 2024-04-16 (2 years ago) There are currently 58 committers and 22 PMC members in this project. The Committer-to-PMC ratio is roughly 8:3. Community changes, past quarter: - No new PMC members. Last addition was Adrian Garcia Badaracco on 2026-02-01. - Bhargava Vadlamani was added as committer on 2026-04-28 - Kumar Ujjawal was added as committer on 2026-04-28 ## Project Activity: Note that almost all communication for DataFusion and its subprojects happens on github and so our dev mailing list traffic is fairly light. ### DataFusion core https://github.com/apache/datafusion 54.0.0 was released on 2026-06-09. 53.1.0 was released on 2026-04-16. 53.0.0 was released on 2026-03-23. 52.5.0 was released on 2026-04-11. 52.4.0 was released on 2026-03-22. 52.3.0 was released on 2026-03-12. Our releases now consist of contributions from over 120 distinct contributors (was 100), and we average around [9.2 commits per day] to the main repo (up from [7.8 commits per day]) [9.2 commits per day]: git rev-list --count apache/main --since='2026-03-10 00:00:00' --until='2026-06-08 23:59:59' [7.8 commits per day]: git rev-list --count apache/main --since='2026-02-09 00:00:00' --until='2026-03-09 23:59:59' The community continues to write blogs highlighting our work, see https://datafusion.apache.org/blog/ We continue to hold small scale in person meetups in various locations, which have been successful in bringing together contributors. We had events in Portland, Seattle, NYC, and Stockholm, and are trying to hold more in Asia, such as in China. See a list here: https://datafusion.apache.org/user-guide/concepts-readings-events.html#community-events The overall number of PRs in need of review has been growing, likely due to increasing use of AI coding tools and the overall growth of the community. As the project matures, time is extending between major releases, likely due to increased testing and attention to quality. ### Sub project: DataFusion Python https://github.com/apache/datafusion-python DATAFUSION-PYTHON-53.0.0 was released on 2026-04-12. DATAFUSION-PYTHON-52.3.0 was released on 2026-03-16. In version 53.0.0 we introduced new AI workflows into the project. The primary outcome of this is to provide a method to ensure we have consistent coverage between the exposed datafusion-python APIs and the upstream functions in the core repository. This workflow exposed 55 function gaps between the two repositories that were then corrected. Additionally the datafusion-python project went through a massive overhaul in the documentation of the API surface area to include usage docstrings directly aimed at improving the ability for LLM agents to write effective datafusion-python code. We have additionally released an agent skill that improves the ability of LLMs to write idiomatic datafusion-python code. This has been tested against the TPC-H queries where agents can now faithfully reproduce queries to pass these tests using only the text description of the query. Since the release of 53.0.0 we have added two new LLM skills to complement the above work. First we added a skill that ensures all of the newly exposed functions are “pythonic” in nature rather than just exposing the Rust interface directly. Second we have a skill that verifies that the user facing skill to write idiomatic code is kept up to date with the API surface area of the project. We have published a blog based on the experience of writing these agent skills. You can read it here: https://datafusion.apache.org/blog/2026/05/28/writing-agent-skills/ ### New sub project: DataFusion Java We have added Java Bindings as a subproject. You can read about it here: https://datafusion.apache.org/blog/output/2026/05/26/datafusion-java-0.1.0/ ### Sub project: DataFusion Comet COMET-0.16.0 was released on 2026-01-29. https://github.com/apache/datafusion-comet You can read about the recent happenings in Comet in the blogs: https://datafusion.apache.org/blog/2026/05/07/datafusion-comet-0.16.0 ### Sub project: DataFusion Ballista https://github.com/apache/datafusion-ballista BALLISTA-53.0.0 was released on 2026-05-24 BALLISTA-52.0.0 was released on 2026-03-07. BALLISTA-51.0.0 was released on 2026-01-19. The community has published new post outlining changes to ballista in last 12 months https://datafusion.apache.org/blog/output/2026/05/24/datafusion-ballista-53.0.0/ There has been an increase of number contributions to Ballista, and PR reviews, which is very positive. Efforts were focused on improving observability of running jobs and usability. With hope to improve ballista robustness and performance for SF1000+ workloads. I hope this trend of increased contributions is going to persist in the future. ### Sub project: sqlparser-rs SQLPARSER-0.62.0 was released on 2026-05-27. https://github.com/apache/datafusion-sqlparser-rs Ifeanyi Ubah (iffyio) continues to review most PRs in this repo. ## Community Health: While we as always struggle with code review capacity, we have many active committers, and the community in general helps each other out with reviews. We continue to actively grow our committer and PMC ranks. We continue to merge multiple PRs a day from multiple committers and have contributions from a wide variety of individuals with a wide variety of employers, organizations, and backgrounds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
