Re: New TsFile Structure

Haonan Hou Mon, 30 Mar 2020 19:04:22 -0700

Hi,

I did the performance evaluation too, and got a similar conclusion.


Hardware: macOS 10.15.4 2.9 GHz Intel Core i5, 8G memory.
Data set: 1 Storage group，1 device，3000 measurements，each timeseries has 600000 
data points, long data type
1. select s1 from root.sg1.d1
new_TsFile: 2572
master: 2666
2. select s1, s2, s3, s4, s5, s6, s7, s8, s9, s10 from root.sg1.d1
new_TsFile: 5455
master: 6146
3. select count(s1) from root.sg1.d1
new_TsFile: 570
master: 1510
4. select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), 
count(s7), count(s8), count(s9), count(s10) from root.sg1.d1
new_TsFile: 2132
master: 3675
5. "select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), 
count(s7), count(s8), count(s9), count(s10), count(s11), count(s12), 
count(s13), count(s14), count(s15), count(s16), count(s17), count(s18), 
count(s19), count(s20) from root.sg1.d1"
new_TsFile: 2874
master: 5357
Thanks,
Haonan Hou


On Mar 30, 2020, at 10:02 PM, Jialin Qiao 
<[email protected]<mailto:[email protected]>> wrote:

Hi,


The new TsFile structure (version 2) is ready [1].


The write speed is not affected, the query is accelerated, especially 
aggregation queries.


【Performance evaluation】



Hardware: macOS 10.14.5 2.2 GHz Intel Core i7, 4G memory.

Data set: 1 Storage group，1 device，3000 measurements，each timeseries has 600000 
data points, long data type

IoTDB configuration:

enable_parameter_adapter=false
tsfile_size_threshold=1024L
memtable_size_threshold=5010241024L

[Write evaluation]

new_TsFile：300569ms，14.76G，184 tsfiles
master：300418ms，14.73G，184 tsfiles

[Query evaluation]

select s1 from root.sg1.d1

new_TsFile: 1349ms
master: 2102ms

select s1, s2, s3, s4, s5, s6, s7, s8, s9, s10 from root.sg1.d1

new_TsFile: 3268ms
master: 4621ms

select * from root

new_TsFile: 647934ms
master: 814206ms

select count(s1) from root.sg1.d1

new_TsFile: 421ms
master: 1654ms

select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), 
count(s7), count(s8), count(s9), count(s10) from root.sg1.d1

new_TsFile: 1887ms
master: 4231ms

select count(s1), count(s2), count(s3), count(s4), count(s5), count(s6), 
count(s7), count(s8), count(s9), count(s10), count(s11), count(s12), 
count(s13), count(s14), count(s15), count(s16), count(s17), count(s18), 
count(s19), count(s20), count(s21), count(s22), count(s23), count(s24), 
count(s25), count(s26), count(s27), count(s28), count(s29), count(s30) from 
root.sg1.d1

new_TsFile: 3066ms
master: 6653ms

select count(*) from root

new_TsFile: 2243ms
master: 614638ms





【Design of new TsFile】


In the previous version, the ChunkMetadata is stored by device. Therefore, if 
we want to query one series, we need to read ChunkMetadatas of all measurements 
of its device, which is time consuming.


In the new version, the ChunkMetadata is grouped by time series. Then, if we 
want to query one series, we only need to read ChunkMetadata
of this series. A file level statistics TimeseriesMetadata is added for each 
series to accelerate aggregations.


Besides, by modifying the schema management of TsFile, the constraints that 
measurements that have the same name in one storage group should have the same 
data type is broken.




[1] https://github.com/apache/incubator-iotdb/pull/855


Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

Re: New TsFile Structure

Reply via email to