[
https://issues.apache.org/jira/browse/HUDI-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shibei updated HUDI-3582:
-------------------------
Description:
Nowadays, record level index is mainly implemented for `tagLocation`, and
queries do not benefit from this, see
[https://github.com/apache/hudi/pull/3508] for more detail. In this issue, we
will implement record level index based on Apache Lucene to gain the following
abilities:
1. For point query, we can get file level row number from record level index,
combining parquet column index brought by Spark 3.2 to achieve accurate reading;
2. For `tagLocation` in
> Support record level index based on Apache Lucene to improve query/upsert
> performance
> -------------------------------------------------------------------------------------
>
> Key: HUDI-3582
> URL: https://issues.apache.org/jira/browse/HUDI-3582
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: shibei
> Assignee: shibei
> Priority: Major
>
> Nowadays, record level index is mainly implemented for `tagLocation`, and
> queries do not benefit from this, see
> [https://github.com/apache/hudi/pull/3508] for more detail. In this issue, we
> will implement record level index based on Apache Lucene to gain the
> following abilities:
> 1. For point query, we can get file level row number from record level index,
> combining parquet column index brought by Spark 3.2 to achieve accurate
> reading;
> 2. For `tagLocation` in
--
This message was sent by Atlassian Jira
(v8.20.1#820001)