[jira] [Updated] (HUDI-1469) Faster initialization for larger datasets

Vinoth Chandar (Jira) Mon, 21 Dec 2020 10:16:04 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vinoth Chandar updated HUDI-1469:
---------------------------------
    Status: Open  (was: New)

> Faster initialization for larger datasets
> -----------------------------------------
>
>                 Key: HUDI-1469
>                 URL: https://issues.apache.org/jira/browse/HUDI-1469
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Prashant Wason
>            Assignee: Prashant Wason
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.7.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> For very large tables (200+ partitions and 100K+ files), the current 
> initialization code in HoodieBackedTableMetadataWriter is slow as it uses a 
> sequential listing to list all partitions and files. 
> Also, the above code is inefficient as it list each directory twice - first 
> for getting list of partitions and later for getting list of files. This can 
> be done together. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-1469) Faster initialization for larger datasets

Reply via email to