[ 
https://issues.apache.org/jira/browse/GSOC-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Solodovnik updated GSOC-253:
----------------------------------
    Labels: Doris full-time gsoc2024  (was: full-time gsoc2024)

> [GSoC][Doris]Support UPDATE for Doris Duplicate Key Table
> ---------------------------------------------------------
>
>                 Key: GSOC-253
>                 URL: https://issues.apache.org/jira/browse/GSOC-253
>             Project: Comdev GSOC
>          Issue Type: New Feature
>            Reporter: Calvin Kirs
>            Priority: Major
>              Labels: Doris, full-time, gsoc2024
>
> h2. Objectives
> Support UPDATE for Doris Duplicate Key Table
> Currently, Doris supports three data models, Duplicate Key / Aggregate Key / 
> Unique Key, of which Unique Key has perfect data update support (including 
> UPDATE statement). With the widespread popularity of Doris, users have more 
> demands on Doris. For example, some user needs to perform ETL processing 
> operations inside Doris, but they uses Duplicate Key table and hopes that 
> Duplicate Key can also support UPDATE. For Duplicate Key, since there is no 
> primary key can help we locate one specific row, UPDATE is low efficient. The 
> usual practice is to rewrite all the data, even if the user only updates one 
> field of a row of data, he must rewrite at least the segment file it is in. 
> Another potentially more efficient solution is to implement Duplicate Key by 
> combining Unique Key's Merge-on-Write, and the auto_increment column. i.e., 
> let's change the underlying implementation of Duplicate Key to use Unique Key 
> MoW, and add a hidden auto_increment column in the primary key, so that all 
> the keys written by the user to the Unique Key MoW table are not duplicated, 
> which realizes the semantics of Duplicate Key, and since each row of data has 
> a unique primary key, we can reuse the UPDATE capability of Unique Key to 
> support the Duplicate Key's UPDATE
> We would like participants to help design and implement the solution, and 
> perform performance testing for comparison and performance optimization.
> h2. Recommended Skills
> Familiar with C++ programming
> Familiar with the storage layer of Doris
> h2. Mentor
> Mentor: Chen Zhang, Apache Doris Committer, chzhang1...@gmail.com
> Mentor: Guolei Yi, Apache Doris PMC Member, yiguo...@gmail.com
> Mailing List: d...@doris.apache.org
> Website: [https://doris.apache.org|https://doris.apache.org/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: gsoc-unsubscr...@community.apache.org
For additional commands, e-mail: gsoc-h...@community.apache.org

Reply via email to