Hi all, Thanks for joining the first community meeting. Below is the meeting recap generated by AI and lightly edited by me for clarity. Please take it as a reference.
- Meeting Notes: https://docs.google.com/document/d/14NLYVvApvijsQDt7uCKblVPKhayJSxb6na9dMAp5NAM/edit?usp=sharing - Meeting recording: https://fathom.video/share/xRtnrNXVr1P_1X2kQZ96nKRaPDEWSGCc (I will upload the recording to ASF Cloudberry Youtube Channel later.) ~~~~~ # Meeting Purpose Kick off the first bi-weekly community meeting to align on progress and priorities. # Key Takeaways * PRs Blocked by Architectural Mismatch: Key PRs implementing Postgres-style features (e.g., parallel append) are stalled. They conflict with Cloudberry's MPVV-style execution model, which requires pre-launching workers, unlike Postgres's dynamic approach. * PXF Roadmap Defined: The PXF roadmap has three stages: 1) sync with upstream Greenplum PXF, 2) integrate with the latest kernel (e.g., parallel foreign table scans), and 3) add pushdown capabilities (aggregation, join). * New Extensions Proposed: Two new extensions were proposed: HooksCollector for performance monitoring and yezzey for S3 archiving of append-only tables to reduce storage costs. * Release 2.1: Release 2.1 is code-complete but blocked on testing and documentation. The new binary swap feature is confirmed working, enabling zero-downtime upgrades. # Topics 1. Main Repo & PR Review 1.1 Stalled PRs: A review of old, stalled PRs revealed a core architectural conflict. * Conflict: Postgres-style features (e.g., parallel append) rely on dynamic worker launching, which clashes with CloudBerry's MPV-style model of pre-launching workers before dispatching plans. * Action: Community reviews and feedback are encouraged to help find a solution. 1.2 Dianjin's PRs need more reviews 2. Ecosystem Extensions 2.1 PXF (Parallel eXecution Framework) * Status: Code synced with upstream Greenplum PXF; source cleanup is in progress. * Roadmap: - Sync: Catch up with the upstream Greenplum PXF branch. - Integrate: Leverage the latest kernel's capabilities (e.g., parallel foreign table scans) via the pxf_fdw framework. - Pushdown: Add support for remote aggregation and join pushdown. - Blocker: Orca does not currently support foreign data wrappers (FDWs), which PXF uses. This must be addressed for full integration. Warning: PXF's FDW implementation is not production-ready; VMware recommends it only in PXF 7.1. 2.2 Wal-g (Backup & Restore) * Status: No active development. * Gap: Untested with Pax storage, risking backup/restore failures. * Limitation: Does not support incremental backups for PAX tables due to their unique metadata. * Action: Max will provide PAX documentation to help the team understand its mechanics for Valg integration. 2.3 HooksCollector (Performance Monitoring) * Proposal: Open source the data-gathering component of Greenplum 6's Command Center. * Function: Collects query performance data via hooks and sends it externally via protobuf. * Goal: Attract community contributions and feedback. * Action: Dianjin will share the link for creating a formal proposal in GitHub Discussions. 2.4 Yezzey (S3 Archiving) * Proposal: An extension to upload/download append-only table data to/from S3. * Rationale: To reduce storage costs by moving cold data to cheaper object storage. * Action: Leonid will post the idea to the dev mailing list for public discussion. 2.5. Release & Governance Release 2.1: * Status: Code-complete on the Release 2 branch. * Blockers: Requires more testing and user-facing documentation for building from source. * Binary Swap: The new feature is confirmed working, enabling zero-downtime upgrades. * Release Manager: Ed volunteered but may be unavailable. Dianjin is the backup. 3. Incubation Report: Leonid and Dianjin will collaborate on drafting the report. 4. Open Topics * 2026 Roadmap: Dianjin shared a draft roadmap on the dev mailing list for feedback. * Lakehouse Support: Leonid proposed adding Lakehouse support, noting high community interest in Russia. * Russian Documentation: Leonid's team will translate documentation to Russian and propose hosting it on the official CloudBerry site to create a single source of truth. * TPC-DS Benchmarking: - Problem: Inconsistent TPC-DS test setups between teams yield non-comparable results, hindering effective performance tuning. - Proposed Solution: Integrate a TPC-DS benchmark tool directly into the database kernel (like DuckDB) for easy, standardized execution. # Next Steps - Leonid: * Post the yezzey S3 archiving proposal to the dev mailing list. * Post the Lakehouse support idea to the dev mailing list. * Collaborate with Dianjin on the incubation report. * Host the next community meeting. - Dianjin: * Share the GitHub Discussions link for the HooksCollector proposal. * Confirm Ed's availability for the Release 2.1 manager role. * Share the 2026 roadmap draft on the dev mailing list. * Share the Shenzhen meetup materials (translated to English). - Max: * Send PAX documentation to the team to aid WAL-G integration. - All: * Review stalled PRs and provide feedback. * Discuss the TPC-DS benchmark standardization proposal on the dev mailing list. Next Meeting: - Rescheduled to February 27th to accommodate the Chinese New Year holiday. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
