[jira] [Created] (HUDI-8339) Avoid glob paths and use the log record reader to build functonal index

Sagar Sumit (Jira) Fri, 11 Oct 2024 21:47:04 -0700

Sagar Sumit created HUDI-8339:
---------------------------------

             Summary: Avoid glob paths and use the log record reader to build 
functonal index
                 Key: HUDI-8339
                 URL: https://issues.apache.org/jira/browse/HUDI-8339
             Project: Apache Hudi
          Issue Type: Task
            Reporter: Sagar Sumit
             Fix For: 1.0.0



To buil functional index in Spark, the spark-sql functions are applied over 
Dataset<Row> which are loaded using glob paths here - 
[https://github.com/apache/hudi/blob/7530e4fa48fb6c32e9cafb587914521bbbb4bc23/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkMetadataWriterUtils.java#L164]

There might some inefficiencies with glob path due to listing. Hence, use the 
usual HoodieUnMergedLogRecordReader to load the dataset as HoodieRecord, and 
fetch explicit col values of interest (rquired for functional index) and 
attache file name and create a Row directly (RowFactory may be). And then 
create a dataset out of it and apply the functional index over it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-8339) Avoid glob paths and use the log record reader to build functonal index

Reply via email to