I think the biggest issue of the current master is that the package structures are chaotic. The issue prevents new developers to understand the project. It is a villain like Thanos in the Marvel Universe. That's why the new branch is called kill_Thanos.
Except what Gaofei mentioned, the storage module, TsFile, is also refactored, and the file format has some changes. A brief introduction is at https://github.com/thulab/iotdb/wiki/%5BTsFile%5D-What-is-new-from-v0.7.0--to-Kill_Thanos In the kill_Thanos branch, the package structure is more clear, but there are still many source codes can be refactored better. However, it brings extra works to merge the modifications from master into the kill_Thanos. Because all UT and IT in current kill_Thanos are passed, and the performance is better, I agree to merge the branches as soon as possible. Best, ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 Gaofei Cao <[email protected]> 于2019年1月5日周六 上午12:23写道: > kill_thanos branch in thulab/iotdb has refactored most features as below. > > > 1. Replacing the single point calculation logic with a batch data load > behavior. > In previous branch, the most important two methods in the `Reader` of > IoTDB are `hasNext` and `next` methods, which examine that whether the > given query series has next point and calculate next point. Multiple > invoking of these two methods decreasing the performance of query, so we > added two new methods `hasNextBatch` and `nextBatch`. As a result, we will > load and transfer data in batch rather than a single point. These two > methods are friendly to CPU. > > > 2. Using nio. > In this branch, we replaced ByteArrayInputStream with NIO, taking the > advantage of java NIO. We used `Channel`, `Buffer`, `MMap` more frequently. > > > 3. Adding file stream manager. > In a query of IoTDB, multiple series may be queried, such as a sql `select > * from root.vehicle`. To avoid opened one tsfile multiple times, we > adopting a file stream manager, which ensure that one file will be opened > at most once in IoTDB queries. We adopt an `ExpiredTimeMap` to manage > opened file streams, and close some files when they are not used for a > given expired time. Maybe there are better file stream reader management > methods, I will keep trace it. > > > 4. Optimizing filter efficiency. > Firstly, we removed the previously `Visitor Pattern` implementation of > filter, and adopted an intuitive implementation. > Secondly, we optimized some filter logic to promote performance. For > example, in a sql `select sensor_0, sensor_1 from device_0 where sensor_1 > > 10`, we did some optimization to avoid the duplicate data calculation of > `sensor_1`. > > > 5. Others, such as removing serialization of thrift, changing the file > format of TsFile, maybe someone else can make a supplement. > > > I suggest that merging it into master branch in the next week. > > > Experimental results show that the query test in kill_thanos branch has > approximately 30% ~ 60% performance promotion. > > > By the way, I am considering that how to get a standard, convincing test > data (in IoT domain) to test the writing and querying performance of > IoTDB. Currently, we just use the data generated by `IoTDB Benchmark` > (another project, also available on github.com/thulab/iotdb-benchmark), > which generated 10w row records of 100device * 100sensor. > > > Thanks & Best Regards > > > ----------------------------------- > Cao Gaofei (曹高飞) > School of Software, > Tsinghua University > -----------------------------------
