Hi folks, I've been finding myself wanting a high-level overview of what's going on in the project but find it hard to keep up with all these mailing list threads, and so have added a section to the maintenance dashboard that Alenka and I were working on to give a daily summary of the dev mailing list discussions from the past 90 days.
It can be found at [1] and I'll paste an example of its output (today's result) below [4]. It's super simple - we grab the dev mailing list, wrangle it into a shape we can work with, and then get an LLM summary with a prompt we've iterated on a few times [2]. I'd love some feedback on it - if you also find this useful, what works for you in the current version and what else would you like to see or see differently? Happy for responses here or as issues on the repo [3] Cheers, Nic [1] https://arrow-maintenance.github.io/arrowdash/#overall-summary [2] https://github.com/arrow-maintenance/arrowdash/blob/main/ml_data/prompt_ml_summary.txt [3] https://github.com/arrow-maintenance/arrowdash/issues [4] Example summary below: Ongoing Discussions ADBC Configuration: Finding a consensus on configuration file locations and formats for ADBC drivers, with environment variables being a favored option for ease of use (May 27, June 2). C++ CMake Build System: Simplifying the C++ CMake build configuration, specifically regarding static and shared library linkage, is being explored (June 3). The discussion includes whether to support building both shared and static libraries simultaneously. nanoarrow Release Process: Preparing for the nanoarrow 0.7.0 release, including pre-release checklist and finding a release signer (May 30, May 31). AWS Credit Usage: Discussing the optimal use of donated AWS credits for CI improvements, GPU testing, and benchmarking. Large memory tests were also mentioned (June 12). Emerging Themes Project Component Decoupling: The effort to split language implementations into separate repositories continues, with Swift being the latest candidate for separation (May 16, May 19, May 20). This aims to improve maintainability and allow for language-specific adaptations. C# decoupling has also been raised (May 23). Modernizing C++ Standard: A push to switch the Arrow C++ codebase to require C++20 is gaining momentum, promising benefits like improved language features and library support (May 19). Community Engagement & Support: Identifying volunteers for the Arrow Summit selection committee indicates an ongoing need for community involvement (May 9, May 18, May 19). Kapa.ai’s bot presence on dev docs indicates potential benefit to community engagement (June 12). Legacy Feature Removal: There is a move to deprecate and eventually remove support for older features like Feather V1 format in C++, encouraging migration to newer formats (June 3). Potential Roadblocks Swift Implementation Support: Concerns exist about the level of activity and support for the Swift implementation, even with the proposed repository split (May 19, May 20). Complexity of Build Systems: The discussion around CMake configuration highlights the complexity of managing build systems and supporting various build configurations (June 3). Maintaining Deprecated Features: Balancing the removal of legacy features with the potential impact on downstream users and existing workflows is a consideration, as seen in the Feather V1 deprecation discussion (June 3). Strategic Plans Release Management: Proposing a feature freeze date for the Arrow monorepo 21.0.0 release (July 1) indicates a focus on timely and predictable releases (June 4). External Project Integration: A proposal to donate the arrow-gpu project suggests interest in expanding Arrow’s capabilities in GPU-accelerated computing (June 6). Resource Optimization: Leveraging donated AWS credits to improve CI and benchmarking infrastructure demonstrates a commitment to optimizing resources and improving development workflows (June 12). Codebase Cleanup: Removing stale Rust issues from the main repository suggests an ongoing effort to maintain a clean and organized codebase (June 12). Removal of Skyhook from the main repository (June 16) further demonstrates a desire to reduce the overall footprint of the repository.