[jira] [Updated] (HUDI-9546) Performance improvements for streaming DAG write with secondary index

Lokesh Jain (Jira) Wed, 25 Jun 2025 12:34:44 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Lokesh Jain updated HUDI-9546:
------------------------------
        Parent: HUDI-9281
    Issue Type: Sub-task  (was: Improvement)

> Performance improvements for streaming DAG write with secondary index
> ---------------------------------------------------------------------
>
>                 Key: HUDI-9546
>                 URL: https://issues.apache.org/jira/browse/HUDI-9546
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Lokesh Jain
>            Priority: Critical
>             Fix For: 1.1.0
>
>
> Couple of performance improvements on HUDI-9340.
> 1. While fetching secondary key from file group, we can project the secondary 
> key itself instead of reading the entire record.
> 2. In HoodieAppendHandle, we can avoid reading the file slice twice to 
> compute the secondary index changes. We can use the new records available in 
> the handle and merge with previous file slice to compute the secondary index 
> related changes.
> 3. We currently use toString to get the string representation of secondary 
> key. We need to ensure this works with all data types - like date, timestamp.
> [https://github.com/apache/hudi/blob/e017d85d76b5a2332e96ce0b7e4b2a552f98dadc/hudi-common/src/main/java/org/apache/hudi/metadata/SecondaryIndexRecordGenerationUtils.java#L259]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HUDI-9546) Performance improvements for streaming DAG write with secondary index

Reply via email to