[
https://issues.apache.org/jira/browse/HUDI-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-3582:
--------------------------------------
Component/s: index
> Support record level index based on Apache Lucene to improve
> query/tagLocation performance
> ------------------------------------------------------------------------------------------
>
> Key: HUDI-3582
> URL: https://issues.apache.org/jira/browse/HUDI-3582
> Project: Apache Hudi
> Issue Type: New Feature
> Components: index
> Reporter: shibei
> Assignee: shibei
> Priority: Major
>
> Nowadays, record level index is mainly implemented for `tagLocation`, and
> queries do not benefit from this, see
> [https://github.com/apache/hudi/pull/3508] for more detail. In this issue, we
> will implement record level index based on Apache Lucene to gain the
> following abilities:
> 1. For point query, we can get file level row number from record level index,
> combining parquet column index brought by Spark 3.2 to achieve accurate
> reading;
> 2. For `tagLocation`, we can directly get record location info from record
> level index.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)