[ 
https://issues.apache.org/jira/browse/FLINK-14894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023015#comment-17023015
 ] 

Andrey Zagrebin commented on FLINK-14894:
-----------------------------------------

[~sewen] [~trohrmann] and me had an offline discussion.

The conclusion at the moment is that release unsafe memory, while potentially 
having link on it in Java code, is dangerous. We revert this to rely only on GC 
when there are no links in Java code. The problem can happen e.g. if task 
thread exits w/o joining with IO threads (e.g. spilling in batch job) then the 
unsafe memory is released but it can be written w/o segfault by IO thread. At 
the same time, other task can allocate interleaving memory which can be spoiled 
by that IO thread. We still keep it unsafe to allocate it outside of JVM direct 
memory limit to not interfere with direct allocations, also it does not make 
sense for RocksDB native memory (also accounted in MemoryManager) to be part of 
direct memory limit.

The potential downside can be that over-allocating of unsafe memory will not 
hit the direct limit and will not cause GC immediately which will be the only 
way to release it. In this case, it can cause out-of-memory failures w/o 
triggering GC to release a lot of potentially already unused memory.

If we see the delayed release as a problem then we can investigate further 
optimisations, like:
 * directly monitoring phantom reference queue of the cleaner (if JVM detects 
quickly that there are no more reference to the memory) and explicitly release 
memory ready for GC asap, e.g. after Task exit
 * monitor allocated memory amount and block allocation until GC releases 
occupied memory instead of failing with out-of-memory immediately

> HybridOffHeapUnsafeMemorySegmentTest#testByteBufferWrap failed on Travis
> ------------------------------------------------------------------------
>
>                 Key: FLINK-14894
>                 URL: https://issues.apache.org/jira/browse/FLINK-14894
>             Project: Flink
>          Issue Type: Bug
>          Components: Tests
>    Affects Versions: 1.10.0
>            Reporter: Gary Yao
>            Assignee: Andrey Zagrebin
>            Priority: Major
>              Labels: test-stability
>             Fix For: 1.10.0
>
>
> {noformat}
> HybridOffHeapUnsafeMemorySegmentTest>MemorySegmentTestBase.testByteBufferWrapping:2465
>  expected:<992288337> but was:<196608>
> {noformat}
> https://api.travis-ci.com/v3/job/258950527/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to