Hi all,

Thanks for joining the first community meeting. Below is the meeting
recap generated by AI and lightly edited by me for clarity. Please
take it as a reference.

- Meeting Notes:
https://docs.google.com/document/d/14NLYVvApvijsQDt7uCKblVPKhayJSxb6na9dMAp5NAM/edit?usp=sharing
- Meeting recording:
https://fathom.video/share/xRtnrNXVr1P_1X2kQZ96nKRaPDEWSGCc (I will
upload the recording to ASF Cloudberry Youtube Channel later.)

~~~~~

# Meeting Purpose

Kick off the first bi-weekly community meeting to align on progress
and priorities.

# Key Takeaways

* PRs Blocked by Architectural Mismatch: Key PRs implementing
Postgres-style features (e.g., parallel append) are stalled. They
conflict with Cloudberry's MPVV-style execution model, which requires
pre-launching workers, unlike Postgres's dynamic approach.

*  PXF Roadmap Defined: The PXF roadmap has three stages: 1) sync with
upstream Greenplum PXF, 2) integrate with the latest kernel (e.g.,
parallel foreign table scans), and 3) add pushdown capabilities
(aggregation, join).

* New Extensions Proposed: Two new extensions were proposed:
HooksCollector for performance monitoring and yezzey for S3 archiving
of append-only tables to reduce storage costs.

* Release 2.1: Release 2.1 is code-complete but blocked on testing and
documentation. The new binary swap feature is confirmed working,
enabling zero-downtime upgrades.

# Topics

1. Main Repo & PR Review

1.1 Stalled PRs: A review of old, stalled PRs revealed a core
architectural conflict.
* Conflict: Postgres-style features (e.g., parallel append) rely on
dynamic worker launching, which clashes with CloudBerry's MPV-style
model of pre-launching workers before dispatching plans.
* Action: Community reviews and feedback are encouraged to help find a solution.

1.2 Dianjin's PRs need more reviews

2. Ecosystem Extensions

2.1 PXF (Parallel eXecution Framework)
* Status: Code synced with upstream Greenplum PXF; source cleanup is
in progress.
* Roadmap:
  - Sync: Catch up with the upstream Greenplum PXF branch.
  - Integrate: Leverage the latest kernel's capabilities (e.g.,
parallel foreign table scans) via the pxf_fdw framework.
  - Pushdown: Add support for remote aggregation and join pushdown.
  - Blocker: Orca does not currently support foreign data wrappers
(FDWs), which PXF uses. This must be addressed for full integration.

Warning: PXF's FDW implementation is not production-ready; VMware
recommends it only in PXF 7.1.

2.2 Wal-g (Backup & Restore)
* Status: No active development.
* Gap: Untested with Pax storage, risking backup/restore failures.
* Limitation: Does not support incremental backups for PAX tables due
to their unique metadata.
* Action: Max will provide PAX documentation to help the team
understand its mechanics for Valg integration.

2.3 HooksCollector (Performance Monitoring)
* Proposal: Open source the data-gathering component of Greenplum 6's
Command Center.
* Function: Collects query performance data via hooks and sends it
externally via protobuf.
* Goal: Attract community contributions and feedback.
* Action: Dianjin will share the link for creating a formal proposal
in GitHub Discussions.

2.4 Yezzey (S3 Archiving)
* Proposal: An extension to upload/download append-only table data to/from S3.
* Rationale: To reduce storage costs by moving cold data to cheaper
object storage.
* Action: Leonid will post the idea to the dev mailing list for public
discussion.

2.5. Release & Governance
Release 2.1:
* Status: Code-complete on the Release 2 branch.
* Blockers: Requires more testing and user-facing documentation for
building from source.
* Binary Swap: The new feature is confirmed working, enabling
zero-downtime upgrades.
* Release Manager: Ed volunteered but may be unavailable. Dianjin is the backup.

3. Incubation Report: Leonid and Dianjin will collaborate on drafting
the report.

4. Open Topics

* 2026 Roadmap: Dianjin shared a draft roadmap on the dev mailing list
for feedback.
* Lakehouse Support: Leonid proposed adding Lakehouse support, noting
high community interest in Russia.
* Russian Documentation: Leonid's team will translate documentation to
Russian and propose hosting it on the official CloudBerry site to
create a single source of truth.
* TPC-DS Benchmarking:
 - Problem: Inconsistent TPC-DS test setups between teams yield
non-comparable results, hindering effective performance tuning.
 - Proposed Solution: Integrate a TPC-DS benchmark tool directly into
the database kernel (like DuckDB) for easy, standardized execution.

# Next Steps

- Leonid:
 * Post the yezzey S3 archiving proposal to the dev mailing list.
 * Post the Lakehouse support idea to the dev mailing list.
 * Collaborate with Dianjin on the incubation report.
 * Host the next community meeting.

- Dianjin:
* Share the GitHub Discussions link for the HooksCollector proposal.
* Confirm Ed's availability for the Release 2.1 manager role.
* Share the 2026 roadmap draft on the dev mailing list.
* Share the Shenzhen meetup materials (translated to English).

- Max:
 * Send PAX documentation to the team to aid WAL-G integration.

- All:
 * Review stalled PRs and provide feedback.
 * Discuss the TPC-DS benchmark standardization proposal on the dev
mailing list.

Next Meeting:
- Rescheduled to February 27th to accommodate the Chinese New Year holiday.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to