[
https://issues.apache.org/jira/browse/HUDI-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lokesh Jain updated HUDI-9546:
------------------------------
Parent: HUDI-9281
Issue Type: Sub-task (was: Improvement)
> Performance improvements for streaming DAG write with secondary index
> ---------------------------------------------------------------------
>
> Key: HUDI-9546
> URL: https://issues.apache.org/jira/browse/HUDI-9546
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Lokesh Jain
> Priority: Critical
> Fix For: 1.1.0
>
>
> Couple of performance improvements on HUDI-9340.
> 1. While fetching secondary key from file group, we can project the secondary
> key itself instead of reading the entire record.
> 2. In HoodieAppendHandle, we can avoid reading the file slice twice to
> compute the secondary index changes. We can use the new records available in
> the handle and merge with previous file slice to compute the secondary index
> related changes.
> 3. We currently use toString to get the string representation of secondary
> key. We need to ensure this works with all data types - like date, timestamp.
> [https://github.com/apache/hudi/blob/e017d85d76b5a2332e96ce0b7e4b2a552f98dadc/hudi-common/src/main/java/org/apache/hudi/metadata/SecondaryIndexRecordGenerationUtils.java#L259]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)