[ 
https://issues.apache.org/jira/browse/GSOC-302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Calvin Kirs updated GSOC-302:
-----------------------------
    Labels: gsoc2025 longtime  (was: )

> Apache Doris: Enhancing Group Commit Functionality 
> ---------------------------------------------------
>
>                 Key: GSOC-302
>                 URL: https://issues.apache.org/jira/browse/GSOC-302
>             Project: Comdev GSOC
>          Issue Type: Wish
>            Reporter: Calvin Kirs
>            Priority: Major
>              Labels: gsoc2025, longtime
>
> h2. *Synopsis* 
>    The current Group Commit mechanism in Apache Doris batches data until a 
> predefined size or time threshold is met before committing. This project aims 
> to improve flexibility and control over data visibility by introducing the 
> following enhancements: # 
> {*}Trigger Immediate Flush After a Specified Number of Imports{*}: Allow data 
> to be committed automatically after accumulating a configurable number of 
> import operations.
>  # 
> {*}SYNC TABLE Syntax Support{*}: Enable users to explicitly trigger Group 
> Commit for a table via SQL (e.g., {{{}SYNC TABLE table_name{}}}), ensuring 
> the command returns only after the commit completes.
>  # 
> {*}System Table for Monitoring{*}: Add an {{information_schema.group_commit}} 
> system table to track Group Commit status, including columns such as BE host, 
> table ID, and commit metadata (e.g., batch size, latency).
> h2. *Technical Details*
>  * 
> {*}Languages{*}: C++ (core) and Java (SQL syntax integration).
>  * 
> {*}Tools{*}: GitHub for version control and collaborative development.
> h2. *Timeline (12+ Weeks, Full-Time Commitment - 30 hrs/week)*
>  # 
> *Community Bonding (Weeks 1-2)*
>  ## 
> Collaborate with mentors and the Apache Doris community.
>  ## 
> Set up the development environment and review the existing Group Commit 
> implementation.
>  ## 
> Document the current Group Commit workflow and proposed optimizations.
>  # 
> *Phase 1: Implementation & Testing (Weeks 3-6)*
>  ## 
> Develop support for flushing data after a configurable number of imports.
>  ## 
> Implement the {{SYNC TABLE}} syntax to trigger manual Group Commit.
>  ## 
> Design and integrate the {{information_schema.group_commit}} system table.
>  ## 
> Conduct performance benchmarking and rigorous testing.
>  # 
> *Phase 2: Refinement & Integration (Weeks 7+)*
>  ## 
> Address feedback from code reviews and community testing.
>  ## 
> Finalize documentation and ensure backward compatibility.
>  ## 
> Submit pull requests (PRs) and work toward merging changes into the master 
> branch.
> 🔹 {*}Total Effort{*}: 210+ hours
> *Expected Outcomes*
>  # 
> Enhanced flexibility in Group Commit with configurable flush triggers (size, 
> time, or import count).
>  # 
> A user-friendly {{SYNC TABLE}} SQL command for explicit commit control.
>  # 
> A monitoring system table ({{{}information_schema.group_commit{}}}) for 
> real-time visibility into commit operations.
>  # 
> Robust performance validation and integration into Apache Doris’s core 
> workflow.
> This project will empower users with finer control over data ingestion and 
> visibility while maintaining Doris’s high-throughput capabilities.
>  
> *Contact Information* * 
> Mentor Name: [Yongqiang 
> Yang]([dataroar...@apache.org|mailto:dataroar...@apache.org]) , Apache Doris 
> PMC member
>  * 
> Mentor Name:[Yi Mei]([zhangc...@apache.org|mailto:zhangc...@apache.org]) 
> Apache Hbase Committer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: gsoc-unsubscr...@community.apache.org
For additional commands, e-mail: gsoc-h...@community.apache.org

Reply via email to