[jira] [Updated] (HUDI-3582) Support record level index based on Apache Lucene to improve query/upsert performance

shibei (Jira) Tue, 08 Mar 2022 01:13:05 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


shibei updated HUDI-3582:
-------------------------
    Description: 
Nowadays, record level index is mainly implemented for `tagLocation`, and 
queries do not benefit from this, see 
[https://github.com/apache/hudi/pull/3508] for more detail. In this issue, we 
will implement record level index based on Apache Lucene to gain the following 
abilities:
1. For point query, we can get file level row number from record level index, 
combining parquet column index brought by Spark 3.2 to achieve accurate reading;

2. For `tagLocation` in 

> Support record level index based on Apache Lucene to improve query/upsert 
> performance
> -------------------------------------------------------------------------------------
>
>                 Key: HUDI-3582
>                 URL: https://issues.apache.org/jira/browse/HUDI-3582
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: shibei
>            Assignee: shibei
>            Priority: Major
>
> Nowadays, record level index is mainly implemented for `tagLocation`, and 
> queries do not benefit from this, see 
> [https://github.com/apache/hudi/pull/3508] for more detail. In this issue, we 
> will implement record level index based on Apache Lucene to gain the 
> following abilities:
> 1. For point query, we can get file level row number from record level index, 
> combining parquet column index brought by Spark 3.2 to achieve accurate 
> reading;
> 2. For `tagLocation` in 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3582) Support record level index based on Apache Lucene to improve query/upsert performance

Reply via email to