[jira] [Created] (HUDI-1469) Faster initialization for larger datasets

Prashant Wason (Jira) Thu, 17 Dec 2020 14:28:34 -0800

Prashant Wason created HUDI-1469:
------------------------------------

             Summary: Faster initialization for larger datasets
                 Key: HUDI-1469
                 URL: https://issues.apache.org/jira/browse/HUDI-1469
             Project: Apache Hudi
          Issue Type: Sub-task
            Reporter: Prashant Wason
            Assignee: Prashant Wason
             Fix For: 0.7.0



For very large tables (200+ partitions and 100K+ files), the current 
initialization code in HoodieBackedTableMetadataWriter is slow as it uses a 
sequential listing to list all partitions and files. 

Also, the above code is inefficient as it list each directory twice - first for 
getting list of partitions and later for getting list of files. This can be 
done together. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (HUDI-1469) Faster initialization for larger datasets

Reply via email to