[ 
https://issues.apache.org/jira/browse/HUDI-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsha Teja Kanna updated HUDI-3066:
------------------------------------
    Description: 
After 'metadata table' is enabled, File listing takes long time.

If metadata is enabled on Reader side, it is taking even more time per file 
listing task.

Existing tables (COW) have inline clustering on and have many replace commits.

Logs seem to suggest the delay is in view.AbstractTableFileSystemView 

resetFileGroupsReplaced function.

 

2021-12-18 23:17:54,056 INFO view.AbstractTableFileSystemView: Took 4118 ms to 
read  136 instants, 9731 replaced file groups

2021-12-18 23:38:35,380 INFO log.HoodieMergedLogRecordScanner: Number of log 
files scanned => 437
2021-12-18 23:38:35,380 INFO log.HoodieMergedLogRecordScanner: MaxMemoryInBytes 
allowed for compaction => 1073741824
2021-12-18 23:38:35,380 INFO log.HoodieMergedLogRecordScanner: Number of 
entries in MemoryBasedMap in ExternalSpillableMap => 165
2021-12-18 23:38:35,380 INFO log.HoodieMergedLogRecordScanner: Total size in 
bytes of MemoryBasedMap in ExternalSpillableMap => 259380
2021-12-18 23:38:35,380 INFO log.HoodieMergedLogRecordScanner: Number of 
entries in BitCaskDiskMap in ExternalSpillableMap => 0
2021-12-18 23:38:35,380 INFO log.HoodieMergedLogRecordScanner: Size of file 
spilled to disk => 0
2021-12-18 23:38:35,380 INFO metadata.HoodieBackedTableMetadata: Opened 437 
metadata log files (dataset instant=20211218233649435, metadata 
instant=20211218233649435) in 22935 ms

2021-12-18 23:38:37,193 INFO metadata.HoodieBackedTableMetadata: Opened 437 
metadata log files (dataset instant=20211218233649435, metadata 
instant=20211218233649435) in 22802 ms

Sample file listing tasks

!Screen Shot 2021-12-18 at 6.16.29 PM.png!

 

  was:
After 'metadata table' is enabled, File listing takes long time.

If metadata is enabled on Reader side, it is taking even more time per file 
listing task.

Existing tables (COW) have inline clustering on and have many replace commits.

Logs seem to suggest the delay is in view.AbstractTableFileSystemView 

resetFileGroupsReplaced function.

 

2021-12-18 23:17:54,056 INFO view.AbstractTableFileSystemView: Took 4118 ms to 
read  136 instants, 9731 replaced file groups

2021-12-18 23:38:37,193 INFO metadata.HoodieBackedTableMetadata: Opened 437 
metadata log files (dataset instant=20211218233649435, metadata 
instant=20211218233649435) in 22802 ms

Sample file listing tasks

!Screen Shot 2021-12-18 at 6.16.29 PM.png!

 


> Very slow file listing after enabling metadata for existing tables in 0.10.0 
> release
> ------------------------------------------------------------------------------------
>
>                 Key: HUDI-3066
>                 URL: https://issues.apache.org/jira/browse/HUDI-3066
>             Project: Apache Hudi
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>         Environment: EMR 6.4.0
> Hudi version : 0.10.0
>            Reporter: Harsha Teja Kanna
>            Priority: Critical
>              Labels: performance
>         Attachments: Screen Shot 2021-12-18 at 6.16.29 PM.png
>
>
> After 'metadata table' is enabled, File listing takes long time.
> If metadata is enabled on Reader side, it is taking even more time per file 
> listing task.
> Existing tables (COW) have inline clustering on and have many replace commits.
> Logs seem to suggest the delay is in view.AbstractTableFileSystemView 
> resetFileGroupsReplaced function.
>  
> 2021-12-18 23:17:54,056 INFO view.AbstractTableFileSystemView: Took 4118 ms 
> to read  136 instants, 9731 replaced file groups
> 2021-12-18 23:38:35,380 INFO log.HoodieMergedLogRecordScanner: Number of log 
> files scanned => 437
> 2021-12-18 23:38:35,380 INFO log.HoodieMergedLogRecordScanner: 
> MaxMemoryInBytes allowed for compaction => 1073741824
> 2021-12-18 23:38:35,380 INFO log.HoodieMergedLogRecordScanner: Number of 
> entries in MemoryBasedMap in ExternalSpillableMap => 165
> 2021-12-18 23:38:35,380 INFO log.HoodieMergedLogRecordScanner: Total size in 
> bytes of MemoryBasedMap in ExternalSpillableMap => 259380
> 2021-12-18 23:38:35,380 INFO log.HoodieMergedLogRecordScanner: Number of 
> entries in BitCaskDiskMap in ExternalSpillableMap => 0
> 2021-12-18 23:38:35,380 INFO log.HoodieMergedLogRecordScanner: Size of file 
> spilled to disk => 0
> 2021-12-18 23:38:35,380 INFO metadata.HoodieBackedTableMetadata: Opened 437 
> metadata log files (dataset instant=20211218233649435, metadata 
> instant=20211218233649435) in 22935 ms
> 2021-12-18 23:38:37,193 INFO metadata.HoodieBackedTableMetadata: Opened 437 
> metadata log files (dataset instant=20211218233649435, metadata 
> instant=20211218233649435) in 22802 ms
> Sample file listing tasks
> !Screen Shot 2021-12-18 at 6.16.29 PM.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to