Hi,

After merging PR #169, #149, #144, now an IoTDB instance (using the master
branch code) can run stably (it runs several days and persist several TBs
data  in our environment. Next, we will run it on apache-VM).

However, there are many works left to do to provide more precise control
for the memory usage.

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Julian Feinauer <[email protected]> 于2019年4月29日周一 上午1:11写道:

> Hi Xiangdong,
>
> first, thanks for bringing it back tot he list and the excellent design
> document.
> I just went shortly over it, I need some more time to read it in detail
> and give, if I can, some sensible feedback.
>
> Julian
>
> Am 28.04.19, 15:41 schrieb "Xiangdong Huang" <[email protected]>:
>
>     Hi,
>
>     Tian Jiang and I discussed about this issues and proposed a new design
> to
>     control memory usage for Overflow.
>
>     I leave the design document at:
>
> https://cwiki.apache.org/confluence/display/IOTDB/New+Design+of+Overflow+and+the+Mergence+Process
>
>
>     Please leave your comment at
>     https://issues.apache.org/jira/projects/IOTDB/issues/IOTDB-84 or this
>     mailing list.
>
>     Best,
>
>     -----------------------------------
>     Xiangdong Huang
>     School of Software, Tsinghua University
>
>      黄向东
>     清华大学 软件学院
>
>
>     Xiangdong Huang <[email protected]> 于2019年4月22日周一 下午12:50写道:
>
>     > Hi,
>     >
>     > I think we can split the task 1~3 as sub-tasks in JIRA.
>     >
>     > And, I recommend to learn how Cassandra manages memory (in package:
>     > org.apache.cassandra.utils.memory) and then design our strategy.
>     >
>     > Best,
>     > -----------------------------------
>     > Xiangdong Huang
>     > School of Software, Tsinghua University
>     >
>     >  黄向东
>     > 清华大学 软件学院
>     >
>     >
>     > kangrong (JIRA) <[email protected]> 于2019年4月22日周一 下午12:42写道:
>     >
>     >> kangrong created IOTDB-84:
>     >> -----------------------------
>     >>
>     >>              Summary: Out-of-Memory bug
>     >>                  Key: IOTDB-84
>     >>                  URL:
> https://issues.apache.org/jira/browse/IOTDB-84
>     >>              Project: Apache IoTDB
>     >>           Issue Type: Bug
>     >>             Reporter: kangrong
>     >>          Attachments: image-2019-04-22-12-38-04-903.png
>     >>
>     >> It occurs out-of-memory problem in the last long-term test of branch
>     >> "add_disabled_mem_control":
>     >>
>     >> !image-2019-04-22-12-38-04-903.png!
>     >>
>     >> We analyze the reason and try to solve it as follows:
>     >>  # 1. *Flushing to disk may double the memory cost*: A storage group
>     >> maintains a list of ChunkGroups in memory and will be flushed to
> disk when
>     >> its occupied memory exceeding the threshold (128MB by default).
>     >>  ## In the current implementation, when starting to flush data, a
>     >> ChunkGroup is encoded in memory and thereby a new byte array is
> kept in
>     >> memory. Until all ChunkGroups have been encoded in memory, their
>     >> corresponding byte arrays can be released together. Since the byte
> array
>     >> has a comparable size with original data (0.5× to 1×), the above
> strategy
>     >> may double the memory in the worst case.
>     >>  ## Solution: It is needed to redesign the flush strategy. In
> TsFile, a
>     >> Page is the minimal flush unit, where a ChunkGroup contains several
> Chunks
>     >> and a Chunk contains several pages. Once a page is encoded into a
> byte
>     >> array, we can flush the byte array to disk and then release it. In
> this
>     >> case, the extra memory is a page size (64KB by default) at most.
> This
>     >> modification involves a list of cascading change, including
> metadata format
>     >> and writing process.
>     >>  # *Memory Control Strategy*: It is needed to redesign the memory
> Control
>     >> Strategy. For example, assigning 60% memory to the writing process
> and 30%
>     >> memory to the querying process. The writing memory includes the
> memory
>     >> table and the flush process. As an Insert coming, if its required
> memory
>     >> exceeds TotalMem * 0.6 - MemTableUsage - FlushUsage, the Insert
> will be
>     >> rejected.
>     >>  # *Is the memory statistics accuracy?* In current codes, the memory
>     >> usage of a TSRecord Java Object, corresponding to an Insert SQL, is
>     >> calculated by summating its DataPoints. e.g., "insert into
>     >> root.a.b.c(timestamp,v1, v2) values(1L, true, 1.2f)", its usage is
> 8 + 1 +
>     >> 4=13, which ignores the size of object head and others. It is
> needed to
>     >> redesign the memory statistics accuracy carefully.
>     >>  # *Is there still the memory leak?* As shown in the log of the last
>     >> crash due to the out of memory exception, we find out the actual
> JVM memory
>     >> is 18G, whereas our memory statistic module only counts 8G. Besides
> the
>     >> inaccuracy mentioned in Q3, we doubt there are still memory leak or
> other
>     >> potential problems. We will continue to debug it.
>     >>
>     >>
>     >>
>     >>
>     >>
>     >> --
>     >> This message was sent by Atlassian JIRA
>     >> (v7.6.3#76005)
>     >>
>     >
>
>
>

Reply via email to