xiangdong Huang created IOTDB-87:
------------------------------------

             Summary: Improve Oveflow File Reader to save memory
                 Key: IOTDB-87
                 URL: https://issues.apache.org/jira/browse/IOTDB-87
             Project: Apache IoTDB
          Issue Type: Improvement
            Reporter: xiangdong Huang


Hi, after reading source code of `SeriesReaderFactory.createUnSeqMergeReader`, 
I think the implement may take too much memory, and there is an un-useful read 
stream being opened.

In this function, when you call `chunkLoader.getChunk(chunkMetaData)`, a 
complete Chunk (with its raw data) will be loaded into the memory. 

I think (but I am not sure whether it is good), we can only store ChunkMetaData 
in memory (and use a concise format in memory, which means just leaving useful 
info) and read at most one page for each Chunk ( if a Chunk's time is less than 
other chunks, we do not to read it), it will be fine..

Suppose a page size is 64K, and in the worst case, each Chunk only contains one 
page, then if we have 1TB overflow data, there will be 4 million ChunkMetadata, 
and if we keep them in memory concisely (seems just the start time and the 
offset in the file are needed), the memory space will be less than 
4*(8+8+8)MB=96MB.  (In this case, the total memory cost of the first page of 
each chunk is about 64K*4M=256GB.... so we can not keep all of them in 
memory...).

Does anyone have good idea? (  [~kakayu] )

By the way, in the class of `EngineChunkReader`, the file input stream 
(TsFileSequenceReader) is meaningless, so we do not need to keep it as a field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to