Hi Gavin, Thanks so much for your positive feedback and valuable suggestion on Substrait! Your comment hits the key point of DACP's expression DSL design, and we really appreciate you pointing out this great project. 1. Response to Substrait Alignment We've thoroughly studied Substrait's design (https://substrait.io/ & https://github.com/substrait-io/substrait) after your reminder. You're absolutely right about the overlapping goals—Substrait's standardization of relational algebra expressions aligns with DACP's need for consistent data transformation syntax across systems. The core difference, as you noticed, lies in DACP's focus on data access policies + collaborative capabilities beyond compute expressions: DACP's current expression DSL (e.g., filter/select/limit in SDF payload) was initially designed for scientific computing scenarios, prioritizing simplicity and integration with Arrow's in-memory format. However, we recognize the value of Substrait's cross-language interoperability. We're now evaluating how to align DACP's expression layer with Substrait's specification—this will help DACP seamlessly integrate with other Substrait-enabled engines (like Arrow DataFusion) and avoid reinventing the wheel. 2. Update on DACP's Progress Since our last email, we've made two key advancements: The PR to add DACP to Apache Arrow's "Powered By" list (https://github.com/apache/arrow-site/pull/728) has been approved with "LGTM" feedback, and we've adjusted the project's position to comply with alphabetical sorting requirements. We've refined the IETF draft (https://datatracker.ietf.org/doc/draft-shenzhihong-dacp/) to clarify the mapping between DACP's SDF operations and Arrow Flight's RPC methods (DoGet/DoAction/GetFlightInfo), enhancing protocol compatibility. 3. Further Collaboration Discussion Building on your feedback, we'd love to explore two specific collaboration directions with the community: Substrait Integration: Would the Arrow community recommend full adoption of Substrait for DACP's expression layer, or a hybrid approach that retains DACP's collaboration-specific logic while reusing Substrait's relational algebra specs? Ecosystem Extension: As DACP is now on track to join the "Powered By Arrow" list, we're curious about the community's views on formalizing it as an "Arrow Flight Extension"—this would involve aligning DACP's dataset catalog and provenance tracking with Arrow's metadata standards. We're also happy to present a demo at the upcoming Apache Arrow biweekly meeting (Jan 14) to show DACP's integration with Arrow Flight and discuss Substrait alignment in detail. Thanks again for your insightful input—your suggestion will help DACP better fit into the open-source data ecosystem! Looking forward to continuing the discussion with you and the community. Best regards, Ziang Zhou CNIC, Chinese Academy of Sciences Email: [email protected] Project Repo: https://github.com/rdcn-link/dftp-dacp IETF Draft: https://datatracker.ietf.org/doc/draft-shenzhihong-dacp/
> -----原始邮件----- > 发件人: "Gavin Ray" <[email protected]> > 发送时间: 2025-11-24 22:37:50 (星期一) > 收件人: [email protected] > 抄送: > 主题: Re: [DACP] Proposal: Integrate DACP (Data Access and Collaboration Protocol) into Apache Arrow Ecosystem > > Hiya Ziang, > > This is a neat project, thanks for sharing! > One comment about the expression DSL you have created, for example in the > below "SELECT, FILTER, LIMIT" expression: > > > > > > > > > > > * Example SDF Specific Payload: { "id": > "dacp://10.0.0.1/weather_db/sensors <http: 10.0.0.1="" weather_db="" sensors="">", > "actions": [ ["filter", {"expression": "temperature > 25.0"}], > ["select", {"columns": ["location", "temperature"]}], ["limit", > {"n": 100}] ] }* > > Are you aware of the "Substrait" project, which attempts to standardize > Relational Algebra expressions for sending over the wire between compute > engines? > > substrait-io/substrait: A cross platform way to express data > transformation, relational algebra, standardized record expression and > plans. <https: github.com="" substrait-io="" substrait=""> > Home - Substrait: Cross-Language Serialization for Relational Algebra > <https: substrait.io=""></https:> > > It seems like DACP shares some of the same goals, but with the addition of > access policies for data rather than just data expressions + compute? > > > On Mon, Nov 24, 2025 at 5:42 AM 周子昂 <[email protected]> wrote: > > > Hi Apache Arrow Community, > > > > > > I'm Ziang Zhou from CNIC, Chinese Academy of Sciences. I'd like to share a > > proposal about DACP (Data Access and Collaboration Protocol) on behave of > > my Team, a protocol built on Apache Arrow Flight, and discuss potential > > integration with the Arrow ecosystem. > > > > > > ### 1. Background of DACP > > DACP is designed for cross-node, cross-process data access in scientific > > and distributed computing environments. It addresses pain points like > > fragmented data sharing, lack of collaboration support, and inefficient > > streaming in existing solutions. > > > > > > ### 2. Relationship with Apache Arrow > > DACP is tightly integrated with Apache Arrow Flight: > > - Uses Arrow Flight as the underlying RPC layer for zero-copy, columnar > > data transfer; > > - Reuses Arrow's in-memory format for SDF (Streaming DataFrame), ensuring > > interoperability with other Arrow-enabled systems; > > - Extends Flight with high-level features like dataset catalog management, > > end-to-end provenance tracking, and secure collaboration. > > > > > > ### 3. Current Status > > - Project repo: https://github.com/rdcn-link/dftp-dacp > > - IETF draft: https://datatracker.ietf.org/doc/draft-shenzhihong-dacp/ > > - Has been tested in scientific computing clusters for multi-node data > > sharing in the fields of scientific and distributed computing from > > Institute of Atmospheric Physics, CAS > > > > > > ### 4. Collaboration Request > > We hope to: > > 1. Get technical feedback from the Arrow community on DACP's design > > (especially compatibility with Arrow Flight); > > 2. Discuss the possibility of listing DACP as an official Arrow ecosystem > > extension; > > 3. Explore potential collaboration on protocol optimization (e.g., > > aligning SDF with Arrow's data model). > > > > > > We've already submitted a PR to add DACP to the "Powered By Apache Arrow" > > list (PR link: https://github.com/apache/arrow-site/pull/728), and look > > forward to your valuable comments. > > > > > > Thank you for your time! > > > > > > Best regards, > > Ziang Zhou > > CNIC, Chinese Academy of Sciences > > Email: [email protected] > > Project Repo: https://github.com/rdcn-link/dftp-dacp > > </[email protected]></https:></http:></[email protected]>
