[
https://issues.apache.org/jira/browse/HUDI-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
shibei updated HUDI-3582:
-------------------------
Summary: Introduce Secondary Index to Improve HUDI Query Performance (was:
Support record level index based on Apache Lucene to improve query/tagLocation
performance)
> Introduce Secondary Index to Improve HUDI Query Performance
> -----------------------------------------------------------
>
> Key: HUDI-3582
> URL: https://issues.apache.org/jira/browse/HUDI-3582
> Project: Apache Hudi
> Issue Type: New Feature
> Components: index
> Reporter: shibei
> Assignee: shibei
> Priority: Blocker
> Fix For: 0.12.0
>
>
> Nowadays, record level index is mainly implemented for `tagLocation`, and
> queries do not benefit from this, see
> [https://github.com/apache/hudi/pull/3508] for more detail. In this issue, we
> will implement record level index based on Apache Lucene to gain the
> following abilities:
> 1. For point query, we can get file level row number from record level index,
> combining parquet column index brought by Spark 3.2 to achieve accurate
> reading;
> 2. For `tagLocation`, we can directly get record location info from record
> level index.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)