TengHuo commented on PR #7626:
URL: https://github.com/apache/hudi/pull/7626#issuecomment-1384806549

   Hi @trushev 
   
   Nice feature! We are suffering a similar memory exception in our Flink Hudi 
MOR pipeline. We found a heap OOM exception and abnormal GC activities in task 
managers.
   
   Task manager GC metrics panel
   
   
![tm_gc](https://user-images.githubusercontent.com/7539060/212806036-e3a83720-ba72-42b0-9247-af9ca0913b0c.png)
   
   After checking, we noticed that the size of `CompactionOperation` in memory 
is unusually big, and it should be caused by `HoodieTableFileSystemView`, 
because each instance of `HoodieTableFileSystemView` will load all pending 
compaction plans from the timeline to memory.
   
   This is the part of task manager heap histogram showing the abnormal memory 
usage caused by `CompactionOperation`.
   
   ```log
      9:       2091712       83668480  
org.apache.hudi.common.model.CompactionOperation
    479:            27           4752  org.apache.hudi.io.FlinkAppendHandle
    686:            28           2016  
org.apache.hudi.common.table.view.HoodieTableFileSystemView
    800:            28           1344  
org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView
    806:            27           1296  
org.apache.hudi.table.HoodieFlinkMergeOnReadTable
    954:            27            864  
org.apache.hudi.common.table.view.FileSystemViewManager
   1064:            28            672  
org.apache.hudi.common.table.view.PriorityBasedFileSystemView
   ```
   
   In the timeline of our pipeline, there was only 1 unfinished compaction 
plan, which contained 74704 operations, `74704 * 28 = 2091712`.
   
   May I ask if we can lazy load `HoodieTableFileSystemView` in 
`PriorityBasedFileSystemView` when creating `FlinkAppendHandle`? It can also 
reduce memory usage for active partitions.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to