[ 
https://issues.apache.org/jira/browse/HUDI-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17331030#comment-17331030
 ] 

sivabalan narayanan commented on HUDI-1697:
-------------------------------------------

Which version of hudi are you trying out? 

I guess we landed a PR few months back: 
[https://github.com/apache/hudi/pull/2417] 

[~uditme]: Can you confirm if the PR fixes what's been raised in this jira. 

> A parallel scan needed for FS.
> ------------------------------
>
>                 Key: HUDI-1697
>                 URL: https://issues.apache.org/jira/browse/HUDI-1697
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: DeltaStreamer
>            Reporter: Volodymyr Burenin
>            Priority: Major
>              Labels: sev:high, user-support-issues
>
> I am running Hudi with GCS as a backend. It takes way too long to update the 
> file system view for several hundred partitions. I think it can be done in 
> parallel, so the process could be speed up significantly.
> Here is a small cut from the logs where I notice the slow processing. The 
> original one is much longer and takes several minutes to complete.
> ```
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: #files found in partition 
> (2020/05/12) =66, Time taken =45
> 21/03/16 20:02:56 INFO HoodieTableFileSystemView: Adding file-groups for 
> partition :2020/05/12, #FileGroups=22
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: addFilesToView: 
> NumFiles=66, NumFileGroups=22, FileGroupsCreationTime=3, StoreTimeTaken=1
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: Time to load partition 
> (2020/05/12) =76
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: Took 1 ms to read 0 
> instants, 0 replaced file groups
> 21/03/16 20:02:56 INFO ClusteringUtils: Found 0 files in pending clustering 
> operations
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: Building file system view 
> for partition (2020/03/25)
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: #files found in partition 
> (2020/03/25) =36, Time taken =36
> 21/03/16 20:02:56 INFO HoodieTableFileSystemView: Adding file-groups for 
> partition :2020/03/25, #FileGroups=12
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: addFilesToView: 
> NumFiles=36, NumFileGroups=12, FileGroupsCreationTime=1, StoreTimeTaken=1
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: Time to load partition 
> (2020/03/25) =62
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: Took 0 ms to read 0 
> instants, 0 replaced file groups
> 21/03/16 20:02:56 INFO ClusteringUtils: Found 0 files in pending clustering 
> operations
> 21/03/16 20:02:56 INFO AbstractTableFileSystemView: Building file system view 
> for partition (2020/10/15)
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: #files found in partition 
> (2020/10/15) =201, Time taken =100
> 21/03/16 20:02:57 INFO HoodieTableFileSystemView: Adding file-groups for 
> partition :2020/10/15, #FileGroups=128
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: addFilesToView: 
> NumFiles=201, NumFileGroups=128, FileGroupsCreationTime=6, StoreTimeTaken=1
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Time to load partition 
> (2020/10/15) =148
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Took 0 ms to read 0 
> instants, 0 replaced file groups
> 21/03/16 20:02:57 INFO ClusteringUtils: Found 0 files in pending clustering 
> operations
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Building file system view 
> for partition (2021/01/11)
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: #files found in partition 
> (2021/01/11) =311, Time taken =71
> 21/03/16 20:02:57 INFO HoodieTableFileSystemView: Adding file-groups for 
> partition :2021/01/11, #FileGroups=302
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: addFilesToView: 
> NumFiles=311, NumFileGroups=302, FileGroupsCreationTime=9, StoreTimeTaken=1
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Time to load partition 
> (2021/01/11) =110
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Took 0 ms to read 0 
> instants, 0 replaced file groups
> 21/03/16 20:02:57 INFO ClusteringUtils: Found 0 files in pending clustering 
> operations
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Building file system view 
> for partition (2019/07/08)
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: #files found in partition 
> (2019/07/08) =2, Time taken =40
> 21/03/16 20:02:57 INFO HoodieTableFileSystemView: Adding file-groups for 
> partition :2019/07/08, #FileGroups=1
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: addFilesToView: 
> NumFiles=2, NumFileGroups=1, FileGroupsCreationTime=0, StoreTimeTaken=1
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Time to load partition 
> (2019/07/08) =63
> 21/03/16 20:02:57 INFO AbstractTableFileSystemView: Took 0 ms to read 0 
> instants, 0 replaced file groups
> 21/03/16 20:02:57 INFO ClusteringUtils: Found 0 files in pending clustering 
> operations
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to