Hi, I did the performance evaluation too, and got a similar conclusion.
Hardware: macOS 10.15.4 2.9 GHz Intel Core i5, 8G memory. Data set: 1 Storage group,1 device,3000 measurements,each timeseries has 600000 data points, long data type 1. select s1 from root.sg1.d1 new_TsFile: 2572 master: 2666 2. select s1, s2, s3, s4, s5, s6, s7, s8, s9, s10 from root.sg1.d1 new_TsFile: 5455 master: 6146 3. select count(s1) from root.sg1.d1 new_TsFile: 570 master: 1510 4. select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10) from root.sg1.d1 new_TsFile: 2132 master: 3675 5. "select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10), count(s11), count(s12), count(s13), count(s14), count(s15), count(s16), count(s17), count(s18), count(s19), count(s20) from root.sg1.d1" new_TsFile: 2874 master: 5357 Thanks, Haonan Hou On Mar 30, 2020, at 10:02 PM, Jialin Qiao <[email protected]<mailto:[email protected]>> wrote: Hi, The new TsFile structure (version 2) is ready [1]. The write speed is not affected, the query is accelerated, especially aggregation queries. 【Performance evaluation】 Hardware: macOS 10.14.5 2.2 GHz Intel Core i7, 4G memory. Data set: 1 Storage group,1 device,3000 measurements,each timeseries has 600000 data points, long data type IoTDB configuration: enable_parameter_adapter=false tsfile_size_threshold=1024L memtable_size_threshold=5010241024L [Write evaluation] new_TsFile:300569ms,14.76G,184 tsfiles master:300418ms,14.73G,184 tsfiles [Query evaluation] select s1 from root.sg1.d1 new_TsFile: 1349ms master: 2102ms select s1, s2, s3, s4, s5, s6, s7, s8, s9, s10 from root.sg1.d1 new_TsFile: 3268ms master: 4621ms select * from root new_TsFile: 647934ms master: 814206ms select count(s1) from root.sg1.d1 new_TsFile: 421ms master: 1654ms select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10) from root.sg1.d1 new_TsFile: 1887ms master: 4231ms select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), count(s7), count(s8), count(s9), count(s10), count(s11), count(s12), count(s13), count(s14), count(s15), count(s16), count(s17), count(s18), count(s19), count(s20), count(s21), count(s22), count(s23), count(s24), count(s25), count(s26), count(s27), count(s28), count(s29), count(s30) from root.sg1.d1 new_TsFile: 3066ms master: 6653ms select count(*) from root new_TsFile: 2243ms master: 614638ms 【Design of new TsFile】 In the previous version, the ChunkMetadata is stored by device. Therefore, if we want to query one series, we need to read ChunkMetadatas of all measurements of its device, which is time consuming. In the new version, the ChunkMetadata is grouped by time series. Then, if we want to query one series, we only need to read ChunkMetadata of this series. A file level statistics TimeseriesMetadata is added for each series to accelerate aggregations. Besides, by modifying the schema management of TsFile, the constraints that measurements that have the same name in one storage group should have the same data type is broken. [1] https://github.com/apache/incubator-iotdb/pull/855 Thanks, -- Jialin Qiao School of Software, Tsinghua University 乔嘉林 清华大学 软件学院
