[ https://issues.apache.org/jira/browse/GSOC-302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Calvin Kirs updated GSOC-302: ----------------------------- Labels: gsoc2025 longtime (was: ) > Apache Doris: Enhancing Group Commit Functionality > --------------------------------------------------- > > Key: GSOC-302 > URL: https://issues.apache.org/jira/browse/GSOC-302 > Project: Comdev GSOC > Issue Type: Wish > Reporter: Calvin Kirs > Priority: Major > Labels: gsoc2025, longtime > > h2. *Synopsis* > The current Group Commit mechanism in Apache Doris batches data until a > predefined size or time threshold is met before committing. This project aims > to improve flexibility and control over data visibility by introducing the > following enhancements: # > {*}Trigger Immediate Flush After a Specified Number of Imports{*}: Allow data > to be committed automatically after accumulating a configurable number of > import operations. > # > {*}SYNC TABLE Syntax Support{*}: Enable users to explicitly trigger Group > Commit for a table via SQL (e.g., {{{}SYNC TABLE table_name{}}}), ensuring > the command returns only after the commit completes. > # > {*}System Table for Monitoring{*}: Add an {{information_schema.group_commit}} > system table to track Group Commit status, including columns such as BE host, > table ID, and commit metadata (e.g., batch size, latency). > h2. *Technical Details* > * > {*}Languages{*}: C++ (core) and Java (SQL syntax integration). > * > {*}Tools{*}: GitHub for version control and collaborative development. > h2. *Timeline (12+ Weeks, Full-Time Commitment - 30 hrs/week)* > # > *Community Bonding (Weeks 1-2)* > ## > Collaborate with mentors and the Apache Doris community. > ## > Set up the development environment and review the existing Group Commit > implementation. > ## > Document the current Group Commit workflow and proposed optimizations. > # > *Phase 1: Implementation & Testing (Weeks 3-6)* > ## > Develop support for flushing data after a configurable number of imports. > ## > Implement the {{SYNC TABLE}} syntax to trigger manual Group Commit. > ## > Design and integrate the {{information_schema.group_commit}} system table. > ## > Conduct performance benchmarking and rigorous testing. > # > *Phase 2: Refinement & Integration (Weeks 7+)* > ## > Address feedback from code reviews and community testing. > ## > Finalize documentation and ensure backward compatibility. > ## > Submit pull requests (PRs) and work toward merging changes into the master > branch. > 🔹 {*}Total Effort{*}: 210+ hours > *Expected Outcomes* > # > Enhanced flexibility in Group Commit with configurable flush triggers (size, > time, or import count). > # > A user-friendly {{SYNC TABLE}} SQL command for explicit commit control. > # > A monitoring system table ({{{}information_schema.group_commit{}}}) for > real-time visibility into commit operations. > # > Robust performance validation and integration into Apache Doris’s core > workflow. > This project will empower users with finer control over data ingestion and > visibility while maintaining Doris’s high-throughput capabilities. > > *Contact Information* * > Mentor Name: [Yongqiang > Yang]([dataroar...@apache.org|mailto:dataroar...@apache.org]) , Apache Doris > PMC member > * > Mentor Name:[Yi Mei]([zhangc...@apache.org|mailto:zhangc...@apache.org]) > Apache Hbase Committer -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: gsoc-unsubscr...@community.apache.org For additional commands, e-mail: gsoc-h...@community.apache.org