[
https://issues.apache.org/jira/browse/HUDI-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Chandar updated HUDI-1469:
---------------------------------
Status: Open (was: New)
> Faster initialization for larger datasets
> -----------------------------------------
>
> Key: HUDI-1469
> URL: https://issues.apache.org/jira/browse/HUDI-1469
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Prashant Wason
> Assignee: Prashant Wason
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.7.0
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> For very large tables (200+ partitions and 100K+ files), the current
> initialization code in HoodieBackedTableMetadataWriter is slow as it uses a
> sequential listing to list all partitions and files.
> Also, the above code is inefficient as it list each directory twice - first
> for getting list of partitions and later for getting list of files. This can
> be done together.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)