Sagar Sumit created HUDI-8339:
---------------------------------
Summary: Avoid glob paths and use the log record reader to build
functonal index
Key: HUDI-8339
URL: https://issues.apache.org/jira/browse/HUDI-8339
Project: Apache Hudi
Issue Type: Task
Reporter: Sagar Sumit
Fix For: 1.0.0
To buil functional index in Spark, the spark-sql functions are applied over
Dataset<Row> which are loaded using glob paths here -
[https://github.com/apache/hudi/blob/7530e4fa48fb6c32e9cafb587914521bbbb4bc23/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/utils/SparkMetadataWriterUtils.java#L164]
There might some inefficiencies with glob path due to listing. Hence, use the
usual HoodieUnMergedLogRecordReader to load the dataset as HoodieRecord, and
fetch explicit col values of interest (rquired for functional index) and
attache file name and create a Row directly (RowFactory may be). And then
create a dataset out of it and apply the functional index over it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)