kangrong created IOTDB-84:
-----------------------------
Summary: Out-of-Memory bug
Key: IOTDB-84
URL: https://issues.apache.org/jira/browse/IOTDB-84
Project: Apache IoTDB
Issue Type: Bug
Reporter: kangrong
Attachments: image-2019-04-22-12-38-04-903.png
It occurs out-of-memory problem in the last long-term test of branch
"add_disabled_mem_control":
!image-2019-04-22-12-38-04-903.png!
We analyze the reason and try to solve it as follows:
# 1. *Flushing to disk may double the memory cost*: A storage group maintains
a list of ChunkGroups in memory and will be flushed to disk when its occupied
memory exceeding the threshold (128MB by default).
## In the current implementation, when starting to flush data, a ChunkGroup is
encoded in memory and thereby a new byte array is kept in memory. Until all
ChunkGroups have been encoded in memory, their corresponding byte arrays can be
released together. Since the byte array has a comparable size with original
data (0.5× to 1×), the above strategy may double the memory in the worst case.
## Solution: It is needed to redesign the flush strategy. In TsFile, a Page is
the minimal flush unit, where a ChunkGroup contains several Chunks and a Chunk
contains several pages. Once a page is encoded into a byte array, we can flush
the byte array to disk and then release it. In this case, the extra memory is a
page size (64KB by default) at most. This modification involves a list of
cascading change, including metadata format and writing process.
# *Memory Control Strategy*: It is needed to redesign the memory Control
Strategy. For example, assigning 60% memory to the writing process and 30%
memory to the querying process. The writing memory includes the memory table
and the flush process. As an Insert coming, if its required memory exceeds
TotalMem * 0.6 - MemTableUsage - FlushUsage, the Insert will be rejected.
# *Is the memory statistics accuracy?* In current codes, the memory usage of a
TSRecord Java Object, corresponding to an Insert SQL, is calculated by
summating its DataPoints. e.g., "insert into root.a.b.c(timestamp,v1, v2)
values(1L, true, 1.2f)", its usage is 8 + 1 + 4=13, which ignores the size of
object head and others. It is needed to redesign the memory statistics accuracy
carefully.
# *Is there still the memory leak?* As shown in the log of the last crash due
to the out of memory exception, we find out the actual JVM memory is 18G,
whereas our memory statistic module only counts 8G. Besides the inaccuracy
mentioned in Q3, we doubt there are still memory leak or other potential
problems. We will continue to debug it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)