Hi,

I think it is a bug in the `load` function now, and needs to be fixed
quickly.

Firstly, let's consider that there is no `load` function.
In this case, the files will have the same order no matter you use which
device's timeline as the ordering dimension.

(Second, in your case, can we put the tsfile 105 into the sequence files?
Condition: all devices in a flushing memetable can be set in a time hole of
the sequence files.)

Third, lets's consider that if the `load` function is enable.

The worest case is that you add a file  which has two devices (device 1 and
device2), and if you use device1's timeline to order files, it is between
F2 and F3, while it is between F1 and F2 if you use device2's timeline.

device1: F1   F2   _HOLE__ F3
device2: F1  __HOLE__ F2  F3

Then, why not split the file into two files?

Best,
-----------------------------------
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Jialin Qiao <[email protected]> 于2019年12月10日周二 下午7:05写道:

> Hi,
>
> Things become complicated when the load file feature is introduced in
> IoTDB. The newly added data file may contain many devices with different
> time intervals. Therefore, one order of TsFileResources is insufficient.
> A possible solution is to sort the TsFileResources temporarily when
> querying.
>
> Thanks,
> Jialin Qiao
>
> Lei Rui (Jira) <[email protected]> 于2019年12月9日周一 上午12:14写道:
>
> > Lei Rui created IOTDB-346:
> > -----------------------------
> >
> >              Summary: StorageGroupProcessor.sequenceFileList is ordered
> by
> > fileName rather than dataTime
> >                  Key: IOTDB-346
> >                  URL: https://issues.apache.org/jira/browse/IOTDB-346
> >              Project: Apache IoTDB
> >           Issue Type: Bug
> >             Reporter: Lei Rui
> >
> >
> > `StorageGroupProcessor.sequenceFileList` is ordered by fileName rather
> > than by time of data, as reflected in the
> > `StorageGroupProcessor.getAllFiles` method code:
> > {code:java}
> > tsFiles.sort(this::compareFileName);
> > {code}
> > ----
> > I use the following examples to expose the bug when the order of fileName
> > is inconsistent with that of dataTime.
> >
> > First, for preparation, I created three tsfiles using the following sql:
> > {code:java}
> > SET STORAGE GROUP TO root.ln.wf01.wt01
> > CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN,
> > ENCODING=PLAIN
> > CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=DOUBLE,
> > ENCODING=PLAIN
> > CREATE TIMESERIES root.ln.wf01.wt01.hardware WITH DATATYPE=INT32,
> > ENCODING=PLAIN
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(1, 1.1, false, 11)
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(2, 2.2, true, 22)
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(3, 3.3, false, 33)
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(4, 4.4, false, 44)
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(5, 5.5, false, 55)
> > flush
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(100, 100.1, false, 110)
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(150, 200.2, true, 220)
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(200, 300.3, false, 330)
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(250, 400.4, false, 440)
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(300, 500.5, false, 550)
> > flush
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(10, 10.1, false, 110)
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(20, 20.2, true, 220)
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(30, 30.3, false, 330)
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(40, 40.4, false, 440)
> > INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> > values(50, 50.5, false, 550)
> > flush
> > {code}
> > The tsfiles created are organized in the following directory structure:
> > {code:java}
> > |data
> > |--sequence
> > |----root.ln.wf01.wt01
> > |------1575813520203-101-0.tsfile
> > |------1575813520203-101-0.tsfile.resource
> > |------1575813520669-103-0.tsfile
> > |------1575813520669-103-0.tsfile.resource
> > |--unsequence
> > |----root.ln.wf01.wt01
> > |------1575813521063-105-0.tsfile
> > |------1575813521063-105-0.tsfile.resource
> > {code}
> > ||File Name||Data Time||
> > |(a) 1575813520203-101-0.tsfile|1-5|
> > |(c) 1575813521063-105-0.tsfile|10-50|
> > |(b) 1575813520669-103-0.tsfile|100-300|
> >
> > Note how the order of fileName is inconsistent with that of dataTime.
> >
> > By the way, if you look into the code, you will know how the file name is
> > generated:
> > {code:java}
> > System.currentTimeMillis() + IoTDBConstant.TSFILE_NAME_SEPARATOR +
> > versionController.nextVersion() + IoTDBConstant.TSFILE_NAME_SEPARATOR +
> "0"
> > + TSFILE_SUFFIX
> > {code}
> > ----
> > Then, I loaded the three tsfiles into another brand new IoTDB. I did two
> > experiments with different loading orders each.
> >
> > In the first experiment, the tsfiles were loaded in their data time
> order.
> > That is,
> > {code:java}
> > IoTDB> load 1575813520203-101-0.tsfile // tsfile (a), with data time 1-5
> > IoTDB> load 1575813521063-105-0.tsfile // tsfile (c), with data time
> 10-50
> > IoTDB> load 1575813520669-103-0.tsfile // tsfile (b), with data time
> > 100-300{code}
> > After loading successfully, I did the following query in the same client
> > window and got the wrong result:
> > {code:java}
> > IoTDB> select * from root
> >
> >
> +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+
> > |                               Time|root.ln.wf01.wt01.temperature|
> >  root.ln.wf01.wt01.status|   root.ln.wf01.wt01.hardware|
> >
> >
> +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+
> > |      1970-01-01T08:00:00.001+08:00|                          1.1|
> >                 false|                           11|
> > |      1970-01-01T08:00:00.002+08:00|                          2.2|
> >                  true|                           22|
> > |      1970-01-01T08:00:00.003+08:00|                          3.3|
> >                 false|                           33|
> > |      1970-01-01T08:00:00.004+08:00|                          4.4|
> >                 false|                           44|
> > |      1970-01-01T08:00:00.005+08:00|                          5.5|
> >                 false|                           55|
> > |      1970-01-01T08:00:00.100+08:00|                        100.1|
> >                 false|                          110|
> > |      1970-01-01T08:00:00.150+08:00|                        200.2|
> >                  true|                          220|
> > |      1970-01-01T08:00:00.200+08:00|                        300.3|
> >                 false|                          330|
> > |      1970-01-01T08:00:00.250+08:00|                        400.4|
> >                 false|                          440|
> > |      1970-01-01T08:00:00.300+08:00|                        500.5|
> >                 false|                          550|
> > |      1970-01-01T08:00:00.010+08:00|                         10.1|
> >                 false|                          110|
> > |      1970-01-01T08:00:00.020+08:00|                         20.2|
> >                  true|                          220|
> > |      1970-01-01T08:00:00.030+08:00|                         30.3|
> >                 false|                          330|
> > |      1970-01-01T08:00:00.040+08:00|                         40.4|
> >                 false|                          440|
> > |      1970-01-01T08:00:00.050+08:00|                         50.5|
> >                 false|                          550|
> >
> >
> +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+
> > Total line number = 15
> > It costs 0.198s
> > {code}
> > I checked the data directory of the loaded server and it looks like this:
> > {code:java}
> > |data
> > |--sequence
> > |----root.ln.wf01.wt01
> > |------1575813520203-101-0.tsfile
> > |------1575813520203-101-0.tsfile.resource
> > |------1575813520669-103-0.tsfile
> > |------1575813520669-103-0.tsfile.resource
> > |------1575813521063-105-0.tsfile
> > |------1575813521063-105-0.tsfile.resource
> > |--unsequence{code}
> > ----
> > In the second experiment, the tsfiles were loaded in their file name
> > order. That is,
> > {code:java}
> > IoTDB> load 1575813520203-101-0.tsfile // tsfile (a), with data time 1-5
> > IoTDB> load 1575813520669-103-0.tsfile // tsfile (b), with data time
> > 100-300
> > IoTDB> load 1575813521063-105-0.tsfile // tsfile (c), with data time
> > 10-50{code}
> > Note that I was expected the tsfile (c) be loaded as into the unsequence
> > data directory.
> >
> > After loading successfully, I did the following query in the same client
> > window and got the CORRECT result:
> > {code:java}
> > IoTDB> select * from root
> >
> >
> +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+
> > |                               Time|root.ln.wf01.wt01.temperature|
> >  root.ln.wf01.wt01.status|   root.ln.wf01.wt01.hardware|
> >
> >
> +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+
> > |      1970-01-01T08:00:00.001+08:00|                          1.1|
> >                 false|                           11|
> > |      1970-01-01T08:00:00.002+08:00|                          2.2|
> >                  true|                           22|
> > |      1970-01-01T08:00:00.003+08:00|                          3.3|
> >                 false|                           33|
> > |      1970-01-01T08:00:00.004+08:00|                          4.4|
> >                 false|                           44|
> > |      1970-01-01T08:00:00.005+08:00|                          5.5|
> >                 false|                           55|
> > |      1970-01-01T08:00:00.010+08:00|                         10.1|
> >                 false|                          110|
> > |      1970-01-01T08:00:00.020+08:00|                         20.2|
> >                  true|                          220|
> > |      1970-01-01T08:00:00.030+08:00|                         30.3|
> >                 false|                          330|
> > |      1970-01-01T08:00:00.040+08:00|                         40.4|
> >                 false|                          440|
> > |      1970-01-01T08:00:00.050+08:00|                         50.5|
> >                 false|                          550|
> > |      1970-01-01T08:00:00.100+08:00|                        100.1|
> >                 false|                          110|
> > |      1970-01-01T08:00:00.150+08:00|                        200.2|
> >                  true|                          220|
> > |      1970-01-01T08:00:00.200+08:00|                        300.3|
> >                 false|                          330|
> > |      1970-01-01T08:00:00.250+08:00|                        400.4|
> >                 false|                          440|
> > |      1970-01-01T08:00:00.300+08:00|                        500.5|
> >                 false|                          550|
> >
> >
> +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+
> > Total line number = 15
> > It costs 0.267s
> > {code}
> > I looked into the data directory of the loaded server and surprisingly it
> > is the same as in the first experiment. Further in the second
> experiment, I
> > restarted the server and the client, and queried again. This time, the
> > result is wrong again as that of the first experiment.
> >
> > *There is a special confusing point of the second experiment*: why the
> > tsfile (c) is not loaded as an unsequence tsfile? Why did the query
> > executed immediately after the three tsfiles were loaded get the CORRECT
> > result?
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.3.4#803005)
> >
>
>
> --
> —————————————————
> Jialin Qiao
> School of Software, Tsinghua University
>
> 乔嘉林
> 清华大学 软件学院
>

Reply via email to