[
https://issues.apache.org/jira/browse/FLINK-14894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023015#comment-17023015
]
Andrey Zagrebin commented on FLINK-14894:
-----------------------------------------
[~sewen] [~trohrmann] and me had an offline discussion.
The conclusion at the moment is that release unsafe memory, while potentially
having link on it in Java code, is dangerous. We revert this to rely only on GC
when there are no links in Java code. The problem can happen e.g. if task
thread exits w/o joining with IO threads (e.g. spilling in batch job) then the
unsafe memory is released but it can be written w/o segfault by IO thread. At
the same time, other task can allocate interleaving memory which can be spoiled
by that IO thread. We still keep it unsafe to allocate it outside of JVM direct
memory limit to not interfere with direct allocations, also it does not make
sense for RocksDB native memory (also accounted in MemoryManager) to be part of
direct memory limit.
The potential downside can be that over-allocating of unsafe memory will not
hit the direct limit and will not cause GC immediately which will be the only
way to release it. In this case, it can cause out-of-memory failures w/o
triggering GC to release a lot of potentially already unused memory.
If we see the delayed release as a problem then we can investigate further
optimisations, like:
* directly monitoring phantom reference queue of the cleaner (if JVM detects
quickly that there are no more reference to the memory) and explicitly release
memory ready for GC asap, e.g. after Task exit
* monitor allocated memory amount and block allocation until GC releases
occupied memory instead of failing with out-of-memory immediately
> HybridOffHeapUnsafeMemorySegmentTest#testByteBufferWrap failed on Travis
> ------------------------------------------------------------------------
>
> Key: FLINK-14894
> URL: https://issues.apache.org/jira/browse/FLINK-14894
> Project: Flink
> Issue Type: Bug
> Components: Tests
> Affects Versions: 1.10.0
> Reporter: Gary Yao
> Assignee: Andrey Zagrebin
> Priority: Major
> Labels: test-stability
> Fix For: 1.10.0
>
>
> {noformat}
> HybridOffHeapUnsafeMemorySegmentTest>MemorySegmentTestBase.testByteBufferWrapping:2465
> expected:<992288337> but was:<196608>
> {noformat}
> https://api.travis-ci.com/v3/job/258950527/log.txt
--
This message was sent by Atlassian Jira
(v8.3.4#803005)