Prashant Wason created HUDI-1469:
------------------------------------
Summary: Faster initialization for larger datasets
Key: HUDI-1469
URL: https://issues.apache.org/jira/browse/HUDI-1469
Project: Apache Hudi
Issue Type: Sub-task
Reporter: Prashant Wason
Assignee: Prashant Wason
Fix For: 0.7.0
For very large tables (200+ partitions and 100K+ files), the current
initialization code in HoodieBackedTableMetadataWriter is slow as it uses a
sequential listing to list all partitions and files.
Also, the above code is inefficient as it list each directory twice - first for
getting list of partitions and later for getting list of files. This can be
done together.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)