Hi, glad to kill Thanos :) I agree to merge kill_thanos into master. We have already made an unofficial release version(v0.7.1) in Github for master as backups.
As a reminder, except for these optimizations, some features are unavailable compared to the master branch: 1. Update and Delete operations. 2. Advanced queries: Aggregation, GroupByTime and Fill. Besides, the ‘hasNextBatch' and ‘nextBatch’ methods are implemented in TsFile, but most remain to be done in IoTDB engine. The kill_thanos changes too much... We can add these features and further optimize the code with other PRs. Best. -- Jialin Qiao School of Software, Tsinghua University 乔嘉林 清华大学 软件学院 > -----原始邮件----- > 发件人: "Xiangdong Huang" <[email protected]> > 发送时间: 2019-01-05 11:36:07 (星期六) > 收件人: [email protected] > 抄送: > 主题: Re: merge kill_thanos branch to the master branch > > I think the biggest issue of the current master is that the package > structures are chaotic. > The issue prevents new developers to understand the project. > It is a villain like Thanos in the Marvel Universe. That's why the new > branch is called kill_Thanos. > > Except what Gaofei mentioned, the storage module, TsFile, is also > refactored, and the file format has some changes. > A brief introduction is at > https://github.com/thulab/iotdb/wiki/%5BTsFile%5D-What-is-new-from-v0.7.0--to-Kill_Thanos > > > In the kill_Thanos branch, the package structure is more clear, but there > are still many source codes can be refactored better. > However, it brings extra works to merge the modifications from master into > the kill_Thanos. > > Because all UT and IT in current kill_Thanos are passed, and the > performance is better, I agree to merge the branches as soon as possible. > > Best, > ----------------------------------- > Xiangdong Huang > School of Software, Tsinghua University > > 黄向东 > 清华大学 软件学院 > > > Gaofei Cao <[email protected]> 于2019年1月5日周六 上午12:23写道: > > > kill_thanos branch in thulab/iotdb has refactored most features as below. > > > > > > 1. Replacing the single point calculation logic with a batch data load > > behavior. > > In previous branch, the most important two methods in the `Reader` of > > IoTDB are `hasNext` and `next` methods, which examine that whether the > > given query series has next point and calculate next point. Multiple > > invoking of these two methods decreasing the performance of query, so we > > added two new methods `hasNextBatch` and `nextBatch`. As a result, we will > > load and transfer data in batch rather than a single point. These two > > methods are friendly to CPU. > > > > > > 2. Using nio. > > In this branch, we replaced ByteArrayInputStream with NIO, taking the > > advantage of java NIO. We used `Channel`, `Buffer`, `MMap` more frequently. > > > > > > 3. Adding file stream manager. > > In a query of IoTDB, multiple series may be queried, such as a sql `select > > * from root.vehicle`. To avoid opened one tsfile multiple times, we > > adopting a file stream manager, which ensure that one file will be opened > > at most once in IoTDB queries. We adopt an `ExpiredTimeMap` to manage > > opened file streams, and close some files when they are not used for a > > given expired time. Maybe there are better file stream reader management > > methods, I will keep trace it. > > > > > > 4. Optimizing filter efficiency. > > Firstly, we removed the previously `Visitor Pattern` implementation of > > filter, and adopted an intuitive implementation. > > Secondly, we optimized some filter logic to promote performance. For > > example, in a sql `select sensor_0, sensor_1 from device_0 where sensor_1 > > > 10`, we did some optimization to avoid the duplicate data calculation of > > `sensor_1`. > > > > > > 5. Others, such as removing serialization of thrift, changing the file > > format of TsFile, maybe someone else can make a supplement. > > > > > > I suggest that merging it into master branch in the next week. > > > > > > Experimental results show that the query test in kill_thanos branch has > > approximately 30% ~ 60% performance promotion. > > > > > > By the way, I am considering that how to get a standard, convincing test > > data (in IoT domain) to test the writing and querying performance of > > IoTDB. Currently, we just use the data generated by `IoTDB Benchmark` > > (another project, also available on github.com/thulab/iotdb-benchmark), > > which generated 10w row records of 100device * 100sensor. > > > > > > Thanks & Best Regards > > > > > > ----------------------------------- > > Cao Gaofei (曹高飞) > > School of Software, > > Tsinghua University > > -----------------------------------
