[
https://issues.apache.org/jira/browse/IOTDB-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
kangrong closed IOTDB-84.
-------------------------
> Out-of-Memory bug
> -----------------
>
> Key: IOTDB-84
> URL: https://issues.apache.org/jira/browse/IOTDB-84
> Project: Apache IoTDB
> Issue Type: Bug
> Reporter: kangrong
> Priority: Major
> Labels: out-of-memory, pull-request-available
> Attachments: image-2019-04-22-12-38-04-903.png,
> image-2019-04-24-11-50-27-220.png, iostat.png
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> It occurs out-of-memory problem in the last long-term test of branch
> "add_disabled_mem_control":
> !image-2019-04-22-12-38-04-903.png!
> We analyze the reason and try to solve it as follows:
> # 1. *Flushing to disk may double the memory cost*: A storage group
> maintains a list of ChunkGroups in memory and will be flushed to disk when
> its occupied memory exceeding the threshold (128MB by default).
> ## In the current implementation, when starting to flush data, a ChunkGroup
> is encoded in memory and thereby a new byte array is kept in memory. Until
> all ChunkGroups have been encoded in memory, their corresponding byte arrays
> can be released together. Since the byte array has a comparable size with
> original data (0.5× to 1×), the above strategy may double the memory in the
> worst case.
> ## Solution: It is needed to redesign the flush strategy. In TsFile, a Page
> is the minimal flush unit, where a ChunkGroup contains several Chunks and a
> Chunk contains several pages. Once a page is encoded into a byte array, we
> can flush the byte array to disk and then release it. In this case, the extra
> memory is a page size (64KB by default) at most. This modification involves a
> list of cascading change, including metadata format and writing process.
> # *Memory Control Strategy*: It is needed to redesign the memory Control
> Strategy. For example, assigning 60% memory to the writing process and 30%
> memory to the querying process. The writing memory includes the memory table
> and the flush process. As an Insert coming, if its required memory exceeds
> TotalMem * 0.6 - MemTableUsage - FlushUsage, the Insert will be rejected.
> # *Is the memory statistics accuracy?* In current codes, the memory usage of
> a TSRecord Java Object, corresponding to an Insert SQL, is calculated by
> summating its DataPoints. e.g., "insert into root.a.b.c(timestamp,v1, v2)
> values(1L, true, 1.2f)", its usage is 8 + 1 + 4=13, which ignores the size of
> object head and others. It is needed to redesign the memory statistics
> accuracy carefully.
> # *Is there still the memory leak?* As shown in the log of the last crash
> due to the out of memory exception, we find out the actual JVM memory is 18G,
> whereas our memory statistic module only counts 8G. Besides the inaccuracy
> mentioned in Q3, we doubt there are still memory leak or other potential
> problems. We will continue to debug it.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)