kangpinghuang commented on a change in pull request #1267: Add new file format design markdown URL: https://github.com/apache/incubator-doris/pull/1267#discussion_r291893435
########## File path: docs/documentation/cn/extending-doris/doris_storage_optimization.md ########## @@ -0,0 +1,198 @@ +# Doris存储文件格式优化 # + +## 文件格式 ## + + +<center>图1. doris segment文件格式</center> + +文件包括: +- 文件开始是8个字节的magic code,用于识别文件格式和版本 +- Data Region:用于存储各个列的数据信息,这里的数据是按需分page加载的 +- Index Region: doris中将各个列的index数据统一存储在Index Region,这里的数据会按照列粒度进行加载,所以跟列的数据信息分开存储 +- Footer信息 + - FileFooterPB:定义文件的元数据信息 + - 4个字节的footer pb内容的checksum + - 4个字节的FileFooterPB消息长度,用于读取FileFooterPB + - 8个字节的MAGIC CODE,之所以在末位存储,是方便不同的场景进行文件类型的识别 + +文件中的数据按照page的方式进行组织,page是编码和压缩的基本单位。现在的page类型包括以下几种: + +### DataPage ### + +DataPage分为两种:nullable和non-nullable的data page。 + +nullable的data page内容包括: +``` + + +----------------+ Review comment: In this design, the first rowid is encoded in the data. I agree with you that it is better to move first rowid to Page header ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
