[ 
https://issues.apache.org/jira/browse/HUDI-8339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-8339:
------------------------------
    Sprint: Hudi 1.0 Sprint2024/10/7-10/13

> Avoid glob paths and use the log record reader to build functonal index
> -----------------------------------------------------------------------
>
>                 Key: HUDI-8339
>                 URL: https://issues.apache.org/jira/browse/HUDI-8339
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Sagar Sumit
>            Assignee: Sagar Sumit
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> To buil functional index in Spark, the spark-sql functions are applied over 
> Dataset<Row> which are loaded using glob paths here - 
> [https://github.com/apache/hudi/blob/7530e4fa48fb6c32e9cafb587914521bbbb4bc23/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkMetadataWriterUtils.java#L164]
> There might some inefficiencies with glob path due to listing. Hence, use the 
> usual HoodieUnMergedLogRecordReader to load the dataset as HoodieRecord, and 
> fetch explicit col values of interest (rquired for functional index) and 
> attache file name and create a Row directly (RowFactory may be). And then 
> create a dataset out of it and apply the functional index over it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to