Hi, Things become complicated when the load file feature is introduced in IoTDB. The newly added data file may contain many devices with different time intervals. Therefore, one order of TsFileResources is insufficient. A possible solution is to sort the TsFileResources temporarily when querying.
Thanks, Jialin Qiao Lei Rui (Jira) <[email protected]> 于2019年12月9日周一 上午12:14写道: > Lei Rui created IOTDB-346: > ----------------------------- > > Summary: StorageGroupProcessor.sequenceFileList is ordered by > fileName rather than dataTime > Key: IOTDB-346 > URL: https://issues.apache.org/jira/browse/IOTDB-346 > Project: Apache IoTDB > Issue Type: Bug > Reporter: Lei Rui > > > `StorageGroupProcessor.sequenceFileList` is ordered by fileName rather > than by time of data, as reflected in the > `StorageGroupProcessor.getAllFiles` method code: > {code:java} > tsFiles.sort(this::compareFileName); > {code} > ---- > I use the following examples to expose the bug when the order of fileName > is inconsistent with that of dataTime. > > First, for preparation, I created three tsfiles using the following sql: > {code:java} > SET STORAGE GROUP TO root.ln.wf01.wt01 > CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, > ENCODING=PLAIN > CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=DOUBLE, > ENCODING=PLAIN > CREATE TIMESERIES root.ln.wf01.wt01.hardware WITH DATATYPE=INT32, > ENCODING=PLAIN > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(1, 1.1, false, 11) > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(2, 2.2, true, 22) > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(3, 3.3, false, 33) > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(4, 4.4, false, 44) > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(5, 5.5, false, 55) > flush > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(100, 100.1, false, 110) > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(150, 200.2, true, 220) > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(200, 300.3, false, 330) > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(250, 400.4, false, 440) > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(300, 500.5, false, 550) > flush > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(10, 10.1, false, 110) > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(20, 20.2, true, 220) > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(30, 30.3, false, 330) > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(40, 40.4, false, 440) > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > values(50, 50.5, false, 550) > flush > {code} > The tsfiles created are organized in the following directory structure: > {code:java} > |data > |--sequence > |----root.ln.wf01.wt01 > |------1575813520203-101-0.tsfile > |------1575813520203-101-0.tsfile.resource > |------1575813520669-103-0.tsfile > |------1575813520669-103-0.tsfile.resource > |--unsequence > |----root.ln.wf01.wt01 > |------1575813521063-105-0.tsfile > |------1575813521063-105-0.tsfile.resource > {code} > ||File Name||Data Time|| > |(a) 1575813520203-101-0.tsfile|1-5| > |(c) 1575813521063-105-0.tsfile|10-50| > |(b) 1575813520669-103-0.tsfile|100-300| > > Note how the order of fileName is inconsistent with that of dataTime. > > By the way, if you look into the code, you will know how the file name is > generated: > {code:java} > System.currentTimeMillis() + IoTDBConstant.TSFILE_NAME_SEPARATOR + > versionController.nextVersion() + IoTDBConstant.TSFILE_NAME_SEPARATOR + "0" > + TSFILE_SUFFIX > {code} > ---- > Then, I loaded the three tsfiles into another brand new IoTDB. I did two > experiments with different loading orders each. > > In the first experiment, the tsfiles were loaded in their data time order. > That is, > {code:java} > IoTDB> load 1575813520203-101-0.tsfile // tsfile (a), with data time 1-5 > IoTDB> load 1575813521063-105-0.tsfile // tsfile (c), with data time 10-50 > IoTDB> load 1575813520669-103-0.tsfile // tsfile (b), with data time > 100-300{code} > After loading successfully, I did the following query in the same client > window and got the wrong result: > {code:java} > IoTDB> select * from root > > +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+ > | Time|root.ln.wf01.wt01.temperature| > root.ln.wf01.wt01.status| root.ln.wf01.wt01.hardware| > > +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+ > | 1970-01-01T08:00:00.001+08:00| 1.1| > false| 11| > | 1970-01-01T08:00:00.002+08:00| 2.2| > true| 22| > | 1970-01-01T08:00:00.003+08:00| 3.3| > false| 33| > | 1970-01-01T08:00:00.004+08:00| 4.4| > false| 44| > | 1970-01-01T08:00:00.005+08:00| 5.5| > false| 55| > | 1970-01-01T08:00:00.100+08:00| 100.1| > false| 110| > | 1970-01-01T08:00:00.150+08:00| 200.2| > true| 220| > | 1970-01-01T08:00:00.200+08:00| 300.3| > false| 330| > | 1970-01-01T08:00:00.250+08:00| 400.4| > false| 440| > | 1970-01-01T08:00:00.300+08:00| 500.5| > false| 550| > | 1970-01-01T08:00:00.010+08:00| 10.1| > false| 110| > | 1970-01-01T08:00:00.020+08:00| 20.2| > true| 220| > | 1970-01-01T08:00:00.030+08:00| 30.3| > false| 330| > | 1970-01-01T08:00:00.040+08:00| 40.4| > false| 440| > | 1970-01-01T08:00:00.050+08:00| 50.5| > false| 550| > > +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+ > Total line number = 15 > It costs 0.198s > {code} > I checked the data directory of the loaded server and it looks like this: > {code:java} > |data > |--sequence > |----root.ln.wf01.wt01 > |------1575813520203-101-0.tsfile > |------1575813520203-101-0.tsfile.resource > |------1575813520669-103-0.tsfile > |------1575813520669-103-0.tsfile.resource > |------1575813521063-105-0.tsfile > |------1575813521063-105-0.tsfile.resource > |--unsequence{code} > ---- > In the second experiment, the tsfiles were loaded in their file name > order. That is, > {code:java} > IoTDB> load 1575813520203-101-0.tsfile // tsfile (a), with data time 1-5 > IoTDB> load 1575813520669-103-0.tsfile // tsfile (b), with data time > 100-300 > IoTDB> load 1575813521063-105-0.tsfile // tsfile (c), with data time > 10-50{code} > Note that I was expected the tsfile (c) be loaded as into the unsequence > data directory. > > After loading successfully, I did the following query in the same client > window and got the CORRECT result: > {code:java} > IoTDB> select * from root > > +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+ > | Time|root.ln.wf01.wt01.temperature| > root.ln.wf01.wt01.status| root.ln.wf01.wt01.hardware| > > +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+ > | 1970-01-01T08:00:00.001+08:00| 1.1| > false| 11| > | 1970-01-01T08:00:00.002+08:00| 2.2| > true| 22| > | 1970-01-01T08:00:00.003+08:00| 3.3| > false| 33| > | 1970-01-01T08:00:00.004+08:00| 4.4| > false| 44| > | 1970-01-01T08:00:00.005+08:00| 5.5| > false| 55| > | 1970-01-01T08:00:00.010+08:00| 10.1| > false| 110| > | 1970-01-01T08:00:00.020+08:00| 20.2| > true| 220| > | 1970-01-01T08:00:00.030+08:00| 30.3| > false| 330| > | 1970-01-01T08:00:00.040+08:00| 40.4| > false| 440| > | 1970-01-01T08:00:00.050+08:00| 50.5| > false| 550| > | 1970-01-01T08:00:00.100+08:00| 100.1| > false| 110| > | 1970-01-01T08:00:00.150+08:00| 200.2| > true| 220| > | 1970-01-01T08:00:00.200+08:00| 300.3| > false| 330| > | 1970-01-01T08:00:00.250+08:00| 400.4| > false| 440| > | 1970-01-01T08:00:00.300+08:00| 500.5| > false| 550| > > +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+ > Total line number = 15 > It costs 0.267s > {code} > I looked into the data directory of the loaded server and surprisingly it > is the same as in the first experiment. Further in the second experiment, I > restarted the server and the client, and queried again. This time, the > result is wrong again as that of the first experiment. > > *There is a special confusing point of the second experiment*: why the > tsfile (c) is not loaded as an unsequence tsfile? Why did the query > executed immediately after the three tsfiles were loaded get the CORRECT > result? > > > > -- > This message was sent by Atlassian Jira > (v8.3.4#803005) > -- ————————————————— Jialin Qiao School of Software, Tsinghua University 乔嘉林 清华大学 软件学院
