Hi, Maybe we need to put PageHeaders into ChunkHeader, if we put PageHeaders into ChunkMetadata, we could not sequentially read TsFile.
Thanks, -- Jialin Qiao School of Software, Tsinghua University 乔嘉林 清华大学 软件学院 > -----原始邮件----- > 发件人: "Dawei Liu" <[email protected]> > 发送时间: 2020-02-24 16:40:05 (星期一) > 收件人: [email protected] > 抄送: > 主题: [DISCUSS] Optimize TsFile structure to reduce unnecessary IO > > Hi, > > In the current TsFile structure, PageHeader and PageData are compactly put > together in a Chunk, like a chain structure [1]. > > The basic unit that is read from the hard disk each time is the Chunk. > For the query scenario of device * sensor, it would appear that we read too > much data, > so we considered a new optimization direction: > use PageHeaders to filter the data first, then we can be more precise about > which Pages need to be read.. > > But we still have a debate about where to put the PageHeaders: > > 1.put the PageHeader into the ChunkMetaData. > The nice thing about this is that we can start filtering the data once the IO > is done. > > 2.put the PageHeader in the ChunkHeader. > so we need to read the PageHeader one more time, but the advantage is that we > save more memory when we read the List of the device. > > For details, please see [2] > > What do you think? > > > > Regards, > --- > Dawei Liu > > > [1] > https://user-images.githubusercontent.com/33376433/69341240-26012300-0ca4-11ea-91a1-d516810cad44.png > > <https://user-images.githubusercontent.com/33376433/69341240-26012300-0ca4-11ea-91a1-d516810cad44.png> > [2] > https://issues.apache.org/jira/secure/attachment/12994279/131582515824_.pic_hd.jpg > > <https://issues.apache.org/jira/secure/attachment/12994279/131582515824_.pic_hd.jpg>
