Hi, > so I think it should be dumped into iotdb as an unseq file and sorted in memory with the original files.
I do not think so. Putting the file to unseq folder will decrease the query speed (at least for current implementation, as I know). In my opinion, if a part of data (notice that I am not saying a file. I want to say a part of data) can be considered as ordered data (i.e., sequence data), putting it to the seq folder may friendly for queries. Best, ----------------------------------- Xiangdong Huang School of Software, Tsinghua University 黄向东 清华大学 软件学院 atoiLiu <[email protected]> 于2019年12月10日周二 下午11:16写道: > Hi, > > I think the semantics of load are the same as insert, except this insert > is a sealed file, so I think it should be dumped into iotdb as an unseq > file and sorted in memory with the original files. > > This may cause queries to be very slow, but we should prompt the user to > do a merge command ?? > > > 在 2019年12月10日,下午9:04,Xiangdong Huang <[email protected]> 写道: > > > > Hi, > > > > I think it is a bug in the `load` function now, and needs to be fixed > > quickly. > > > > Firstly, let's consider that there is no `load` function. > > In this case, the files will have the same order no matter you use which > > device's timeline as the ordering dimension. > > > > (Second, in your case, can we put the tsfile 105 into the sequence files? > > Condition: all devices in a flushing memetable can be set in a time hole > of > > the sequence files.) > > > > Third, lets's consider that if the `load` function is enable. > > > > The worest case is that you add a file which has two devices (device 1 > and > > device2), and if you use device1's timeline to order files, it is between > > F2 and F3, while it is between F1 and F2 if you use device2's timeline. > > > > device1: F1 F2 _HOLE__ F3 > > device2: F1 __HOLE__ F2 F3 > > > > Then, why not split the file into two files? > > > > Best, > > ----------------------------------- > > Xiangdong Huang > > School of Software, Tsinghua University > > > > 黄向东 > > 清华大学 软件学院 > > > > > > Jialin Qiao <[email protected]> 于2019年12月10日周二 下午7:05写道: > > > >> Hi, > >> > >> Things become complicated when the load file feature is introduced in > >> IoTDB. The newly added data file may contain many devices with different > >> time intervals. Therefore, one order of TsFileResources is insufficient. > >> A possible solution is to sort the TsFileResources temporarily when > >> querying. > >> > >> Thanks, > >> Jialin Qiao > >> > >> Lei Rui (Jira) <[email protected]> 于2019年12月9日周一 上午12:14写道: > >> > >>> Lei Rui created IOTDB-346: > >>> ----------------------------- > >>> > >>> Summary: StorageGroupProcessor.sequenceFileList is ordered > >> by > >>> fileName rather than dataTime > >>> Key: IOTDB-346 > >>> URL: https://issues.apache.org/jira/browse/IOTDB-346 > >>> Project: Apache IoTDB > >>> Issue Type: Bug > >>> Reporter: Lei Rui > >>> > >>> > >>> `StorageGroupProcessor.sequenceFileList` is ordered by fileName rather > >>> than by time of data, as reflected in the > >>> `StorageGroupProcessor.getAllFiles` method code: > >>> {code:java} > >>> tsFiles.sort(this::compareFileName); > >>> {code} > >>> ---- > >>> I use the following examples to expose the bug when the order of > fileName > >>> is inconsistent with that of dataTime. > >>> > >>> First, for preparation, I created three tsfiles using the following > sql: > >>> {code:java} > >>> SET STORAGE GROUP TO root.ln.wf01.wt01 > >>> CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, > >>> ENCODING=PLAIN > >>> CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=DOUBLE, > >>> ENCODING=PLAIN > >>> CREATE TIMESERIES root.ln.wf01.wt01.hardware WITH DATATYPE=INT32, > >>> ENCODING=PLAIN > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(1, 1.1, false, 11) > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(2, 2.2, true, 22) > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(3, 3.3, false, 33) > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(4, 4.4, false, 44) > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(5, 5.5, false, 55) > >>> flush > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(100, 100.1, false, 110) > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(150, 200.2, true, 220) > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(200, 300.3, false, 330) > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(250, 400.4, false, 440) > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(300, 500.5, false, 550) > >>> flush > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(10, 10.1, false, 110) > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(20, 20.2, true, 220) > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(30, 30.3, false, 330) > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(40, 40.4, false, 440) > >>> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware) > >>> values(50, 50.5, false, 550) > >>> flush > >>> {code} > >>> The tsfiles created are organized in the following directory structure: > >>> {code:java} > >>> |data > >>> |--sequence > >>> |----root.ln.wf01.wt01 > >>> |------1575813520203-101-0.tsfile > >>> |------1575813520203-101-0.tsfile.resource > >>> |------1575813520669-103-0.tsfile > >>> |------1575813520669-103-0.tsfile.resource > >>> |--unsequence > >>> |----root.ln.wf01.wt01 > >>> |------1575813521063-105-0.tsfile > >>> |------1575813521063-105-0.tsfile.resource > >>> {code} > >>> ||File Name||Data Time|| > >>> |(a) 1575813520203-101-0.tsfile|1-5| > >>> |(c) 1575813521063-105-0.tsfile|10-50| > >>> |(b) 1575813520669-103-0.tsfile|100-300| > >>> > >>> Note how the order of fileName is inconsistent with that of dataTime. > >>> > >>> By the way, if you look into the code, you will know how the file name > is > >>> generated: > >>> {code:java} > >>> System.currentTimeMillis() + IoTDBConstant.TSFILE_NAME_SEPARATOR + > >>> versionController.nextVersion() + IoTDBConstant.TSFILE_NAME_SEPARATOR + > >> "0" > >>> + TSFILE_SUFFIX > >>> {code} > >>> ---- > >>> Then, I loaded the three tsfiles into another brand new IoTDB. I did > two > >>> experiments with different loading orders each. > >>> > >>> In the first experiment, the tsfiles were loaded in their data time > >> order. > >>> That is, > >>> {code:java} > >>> IoTDB> load 1575813520203-101-0.tsfile // tsfile (a), with data time > 1-5 > >>> IoTDB> load 1575813521063-105-0.tsfile // tsfile (c), with data time > >> 10-50 > >>> IoTDB> load 1575813520669-103-0.tsfile // tsfile (b), with data time > >>> 100-300{code} > >>> After loading successfully, I did the following query in the same > client > >>> window and got the wrong result: > >>> {code:java} > >>> IoTDB> select * from root > >>> > >>> > >> > +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+ > >>> | Time|root.ln.wf01.wt01.temperature| > >>> root.ln.wf01.wt01.status| root.ln.wf01.wt01.hardware| > >>> > >>> > >> > +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+ > >>> | 1970-01-01T08:00:00.001+08:00| 1.1| > >>> false| 11| > >>> | 1970-01-01T08:00:00.002+08:00| 2.2| > >>> true| 22| > >>> | 1970-01-01T08:00:00.003+08:00| 3.3| > >>> false| 33| > >>> | 1970-01-01T08:00:00.004+08:00| 4.4| > >>> false| 44| > >>> | 1970-01-01T08:00:00.005+08:00| 5.5| > >>> false| 55| > >>> | 1970-01-01T08:00:00.100+08:00| 100.1| > >>> false| 110| > >>> | 1970-01-01T08:00:00.150+08:00| 200.2| > >>> true| 220| > >>> | 1970-01-01T08:00:00.200+08:00| 300.3| > >>> false| 330| > >>> | 1970-01-01T08:00:00.250+08:00| 400.4| > >>> false| 440| > >>> | 1970-01-01T08:00:00.300+08:00| 500.5| > >>> false| 550| > >>> | 1970-01-01T08:00:00.010+08:00| 10.1| > >>> false| 110| > >>> | 1970-01-01T08:00:00.020+08:00| 20.2| > >>> true| 220| > >>> | 1970-01-01T08:00:00.030+08:00| 30.3| > >>> false| 330| > >>> | 1970-01-01T08:00:00.040+08:00| 40.4| > >>> false| 440| > >>> | 1970-01-01T08:00:00.050+08:00| 50.5| > >>> false| 550| > >>> > >>> > >> > +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+ > >>> Total line number = 15 > >>> It costs 0.198s > >>> {code} > >>> I checked the data directory of the loaded server and it looks like > this: > >>> {code:java} > >>> |data > >>> |--sequence > >>> |----root.ln.wf01.wt01 > >>> |------1575813520203-101-0.tsfile > >>> |------1575813520203-101-0.tsfile.resource > >>> |------1575813520669-103-0.tsfile > >>> |------1575813520669-103-0.tsfile.resource > >>> |------1575813521063-105-0.tsfile > >>> |------1575813521063-105-0.tsfile.resource > >>> |--unsequence{code} > >>> ---- > >>> In the second experiment, the tsfiles were loaded in their file name > >>> order. That is, > >>> {code:java} > >>> IoTDB> load 1575813520203-101-0.tsfile // tsfile (a), with data time > 1-5 > >>> IoTDB> load 1575813520669-103-0.tsfile // tsfile (b), with data time > >>> 100-300 > >>> IoTDB> load 1575813521063-105-0.tsfile // tsfile (c), with data time > >>> 10-50{code} > >>> Note that I was expected the tsfile (c) be loaded as into the > unsequence > >>> data directory. > >>> > >>> After loading successfully, I did the following query in the same > client > >>> window and got the CORRECT result: > >>> {code:java} > >>> IoTDB> select * from root > >>> > >>> > >> > +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+ > >>> | Time|root.ln.wf01.wt01.temperature| > >>> root.ln.wf01.wt01.status| root.ln.wf01.wt01.hardware| > >>> > >>> > >> > +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+ > >>> | 1970-01-01T08:00:00.001+08:00| 1.1| > >>> false| 11| > >>> | 1970-01-01T08:00:00.002+08:00| 2.2| > >>> true| 22| > >>> | 1970-01-01T08:00:00.003+08:00| 3.3| > >>> false| 33| > >>> | 1970-01-01T08:00:00.004+08:00| 4.4| > >>> false| 44| > >>> | 1970-01-01T08:00:00.005+08:00| 5.5| > >>> false| 55| > >>> | 1970-01-01T08:00:00.010+08:00| 10.1| > >>> false| 110| > >>> | 1970-01-01T08:00:00.020+08:00| 20.2| > >>> true| 220| > >>> | 1970-01-01T08:00:00.030+08:00| 30.3| > >>> false| 330| > >>> | 1970-01-01T08:00:00.040+08:00| 40.4| > >>> false| 440| > >>> | 1970-01-01T08:00:00.050+08:00| 50.5| > >>> false| 550| > >>> | 1970-01-01T08:00:00.100+08:00| 100.1| > >>> false| 110| > >>> | 1970-01-01T08:00:00.150+08:00| 200.2| > >>> true| 220| > >>> | 1970-01-01T08:00:00.200+08:00| 300.3| > >>> false| 330| > >>> | 1970-01-01T08:00:00.250+08:00| 400.4| > >>> false| 440| > >>> | 1970-01-01T08:00:00.300+08:00| 500.5| > >>> false| 550| > >>> > >>> > >> > +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+ > >>> Total line number = 15 > >>> It costs 0.267s > >>> {code} > >>> I looked into the data directory of the loaded server and surprisingly > it > >>> is the same as in the first experiment. Further in the second > >> experiment, I > >>> restarted the server and the client, and queried again. This time, the > >>> result is wrong again as that of the first experiment. > >>> > >>> *There is a special confusing point of the second experiment*: why the > >>> tsfile (c) is not loaded as an unsequence tsfile? Why did the query > >>> executed immediately after the three tsfiles were loaded get the > CORRECT > >>> result? > >>> > >>> > >>> > >>> -- > >>> This message was sent by Atlassian Jira > >>> (v8.3.4#803005) > >>> > >> > >> > >> -- > >> ————————————————— > >> Jialin Qiao > >> School of Software, Tsinghua University > >> > >> 乔嘉林 > >> 清华大学 软件学院 > >> > >
