[ 
https://issues.apache.org/jira/browse/FLINK-19852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233895#comment-17233895
 ] 

Roman Khachatryan edited comment on FLINK-19852 at 11/17/20, 7:51 PM:
----------------------------------------------------------------------

I took at the code and I think it can be solved by:
 # "transferring" memory segments from an old TempBarrier to the new one in 
BatchTask.resetAllInputs()
 # For that, add TempBarrier.closeForReuse() method, from which return the 
segments instead of calling memManager.release()
 # In TempBarrier constructor, some memory still has to be allocated because 
some segments might have been returned to reader/writer

I don't see a clean way to collect old segments and create a new TB instance 
atomically. In between initInputLocalStrategy should be called. Reusing a 
TempBarrier instance seems to be error-prone.

WDYT?

 

Besides that, I have some concerns regarding the issue:
 # initInputLocalStrategy() might also allocate memory; Are we sure that 
degradation is not caused by this? (e.g. ExternalSorterBuilder.doBuild())
 # at least one thread is created - for each TempBarrier - the same question
 # How big is the regression, are there any numbers? Critical Priority seems a 
bit subjective given that this issue appeared first in 1.10 

[~shaomeng.wang], can you maybe clarify this?


was (Author: roman_khachatryan):
I took at the code and I think it can be solved by:
 # "transferring" memory segments from an old TempBarrier to the new one in 
BatchTask.resetAllInputs()
 # For that, add TempBarrier.closeForReuse() method, from which return the 
segments instead of calling memManager.release()
 # In TempBarrier constructor, some memory still has to be allocated because 
some segments might have been returned to reader/writer

I don't see a clean way to collect old segments and create a new TB instance 
atomically. In between initInputLocalStrategy should be called. Reusing a 
TempBarrier instance seems to be error-prone.

WDYT?

 

Besides that, I have some concerns regarding the issue:
 # initInputLocalStrategy() might also allocate memory; Are we sure that 
degradation is not caused by this? (e.g. ExternalSorterBuilder.doBuild())
 # at least one thread is created - for each TempBarrier - the same question
 # Are there any numbers? Critical Priority seems a bit subjective given that 
this issue appeared first in 1.10 

[~shaomeng.wang], can you maybe clarify?

> Managed memory released check can block IterativeTask
> -----------------------------------------------------
>
>                 Key: FLINK-19852
>                 URL: https://issues.apache.org/jira/browse/FLINK-19852
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>    Affects Versions: 1.11.0, 1.10.2, 1.12.0, 1.11.1, 1.11.2
>            Reporter: shaomeng.wang
>            Assignee: Roman Khachatryan
>            Priority: Critical
>         Attachments: image-2020-10-28-17-48-28-395.png, 
> image-2020-10-28-17-48-48-583.png
>
>
> UnsafeMemoryBudget#reserveMemory, called on TempBarrier, needs time to wait 
> on GC of all allocated/released managed memory at every iteration.
>  
> stack:
> !image-2020-10-28-17-48-48-583.png!
> new TempBarrier in BatchTask
> !image-2020-10-28-17-48-28-395.png!
>  
> These will be very slow than before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to