[jira] [Updated] (HUDI-3582) Introduce Secondary Index to Improve HUDI Query Performance

shibei (Jira) Mon, 25 Apr 2022 19:00:07 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


shibei updated HUDI-3582:
-------------------------
    Summary: Introduce Secondary Index to Improve HUDI Query Performance  (was: 
Support record level index based on Apache Lucene to improve query/tagLocation 
performance)

> Introduce Secondary Index to Improve HUDI Query Performance
> -----------------------------------------------------------
>
>                 Key: HUDI-3582
>                 URL: https://issues.apache.org/jira/browse/HUDI-3582
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: index
>            Reporter: shibei
>            Assignee: shibei
>            Priority: Blocker
>             Fix For: 0.12.0
>
>
> Nowadays, record level index is mainly implemented for `tagLocation`, and 
> queries do not benefit from this, see 
> [https://github.com/apache/hudi/pull/3508] for more detail. In this issue, we 
> will implement record level index based on Apache Lucene to gain the 
> following abilities:
> 1. For point query, we can get file level row number from record level index, 
> combining parquet column index brought by Spark 3.2 to achieve accurate 
> reading;
> 2. For `tagLocation`, we can directly get record location info from record 
> level index.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HUDI-3582) Introduce Secondary Index to Improve HUDI Query Performance

Reply via email to