Hi,

Interesting thoughts!

(1) The page level index could optimize the scenario that a chunk has many 
pages.
When a chunk only has few pages, maybe reading a whole chunk at a time is good. 
We could leave it as an option.
(2) The queried BatchData is never changed and discarded after returning to 
client through RPC. We could use a pool for BatchData, 
just like the MemtablePool to reuse BatchData.

Thansk,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -----原始邮件-----
> 发件人: "DaweiLiu (Jira)" <[email protected]>
> 发送时间: 2020-02-21 23:48:00 (星期五)
> 收件人: [email protected]
> 抄送: 
> 主题: [jira] [Created] (IOTDB-509) Optimize TsFileReader to reduce unnecessary 
> GC and IO.
> 
> DaweiLiu created IOTDB-509:
> ------------------------------
> 
>              Summary: Optimize TsFileReader to reduce unnecessary GC and IO.
>                  Key: IOTDB-509
>                  URL: https://issues.apache.org/jira/browse/IOTDB-509
>              Project: Apache IoTDB
>           Issue Type: Wish
>           Components: Core/TsFile
>             Reporter: DaweiLiu
> 
> 
> I think there are still two parts of TsFile that can be optimized
>  # Reduce unnecessary IO. The current reading is carried out according to the 
> Chunk level. I think we can put pageindex together. When the time in the 
> filter contains the chunk time, all chunk data will be read out and returned 
> directly. When only intersecting, we can determine which pages to read out by 
> reading pageindex, thus reducing unnecessary data reading
>  # The reduction in the gc, read the data returned is based on batchData 
> structure, and the amount of data that is aligned with the page each time, 
> that is, each time when you call next () method reads, will the new a 
> batchData, if the query has experienced thousands of page, that means we have 
> the new 10000 batchData.So I think that we should isolate the data of the 
> page. We do io and serialization / decoding from the hard disk one page at a 
> time, but when it is handed over to the business, it should be a data 
> structure that can be reused. He is Fixed length, just like read (ByteBuffer) 
> in JDK
> 
> 
> 
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)

Reply via email to