GitHub user yjhjstz added a comment to the discussion: [Proposal] Iceberg subsystem for datalake_fdw — design proposal
I'd like to advocate for using https://github.com/apache/iceberg-cpp on the metadata path instead of the Java agent, and for Cloudberry to invest in that community together. Why I think iceberg-cpp is the right long-term bet: 1. No JVM sidecar. The datalake_proxy bgworker that forks and supervises a Java process is real operational complexity — JVM heap tuning, GC pauses, two processes to monitor, and a gRPC hop on every metadata operation. A native C++ library eliminates all of this. 2. Architecture coherence. Cloudberry's core is C/C++. A native library fits naturally into the postmaster/backend process model; a Java subprocess is an alien runtime that complicates crash handling, signal propagation, and resource control. 3. iceberg-cpp is early, but that's precisely the opportunity. The concern about maturity is valid today, but Apache iceberg-cpp is an official Apache project on an active growth trajectory. Rather than working around its gaps by routing through iceberg-java, Cloudberry can close those gaps — contributing the missing spec coverage (partition evolution, equality deletes, CAS commit logic, catalog backends) would benefit the entire ecosystem, not just Cloudberry. 4. Shared investment with the community. leborchuk mentioned https://github.com/lithium-tech/iceberg-cxx as a performance reference. There's also TEA. There are clearly multiple teams working on C++ Iceberg I/O. If we converge on Apache iceberg-cpp as the shared foundation, the effort compounds instead of fragmenting. 5. The metadata path is not forever "not hot". Fragment planning (/fragments) is called on every SELECT. As table file counts grow into the millions, the gRPC round-trip and Java deserialization will show up in query latency. GitHub link: https://github.com/apache/cloudberry/discussions/1683#discussioncomment-16856371 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
