Hi,
The `load` function is my work, so it’s necessary to answer this issue by me.
After your first experiment, it sounds abnormal. I need to replicate your
results to locate the problem. In normal implementation logic, it wound rename
the tsfile and load it into engine, which makes sure that the order of
`fileName` is the same as the order of `dataTime`.
In your second experiment, it’s a normal case. The unsequence file actually has
no overlap time range with sequences list, the reason why it has become
`unsequence` is just that the data inserted late. So in `load` function, I will
judge the time range conflict and based on that it’ll do different actions.
I hope this will help and thanks for your experiments.
Best Regards,
—————————————————
Tianan Li
School of Software, Tsinghua University
李天安
清华大学 软件学院
> 2019年12月9日 上午12:14,Lei Rui (Jira) <[email protected]> 写道:
>
> Lei Rui created IOTDB-346:
> -----------------------------
>
> Summary: StorageGroupProcessor.sequenceFileList is ordered by
> fileName rather than dataTime
> Key: IOTDB-346
> URL: https://issues.apache.org/jira/browse/IOTDB-346
> Project: Apache IoTDB
> Issue Type: Bug
> Reporter: Lei Rui
>
>
> `StorageGroupProcessor.sequenceFileList` is ordered by fileName rather than
> by time of data, as reflected in the `StorageGroupProcessor.getAllFiles`
> method code:
> {code:java}
> tsFiles.sort(this::compareFileName);
> {code}
> ----
> I use the following examples to expose the bug when the order of fileName is
> inconsistent with that of dataTime.
>
> First, for preparation, I created three tsfiles using the following sql:
> {code:java}
> SET STORAGE GROUP TO root.ln.wf01.wt01
> CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN,
> ENCODING=PLAIN
> CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=DOUBLE,
> ENCODING=PLAIN
> CREATE TIMESERIES root.ln.wf01.wt01.hardware WITH DATATYPE=INT32,
> ENCODING=PLAIN
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(1, 1.1, false, 11)
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(2, 2.2, true, 22)
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(3, 3.3, false, 33)
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(4, 4.4, false, 44)
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(5, 5.5, false, 55)
> flush
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(100, 100.1, false, 110)
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(150, 200.2, true, 220)
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(200, 300.3, false, 330)
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(250, 400.4, false, 440)
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(300, 500.5, false, 550)
> flush
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(10, 10.1, false, 110)
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(20, 20.2, true, 220)
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(30, 30.3, false, 330)
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(40, 40.4, false, 440)
> INSERT INTO root.ln.wf01.wt01(timestamp,temperature,status, hardware)
> values(50, 50.5, false, 550)
> flush
> {code}
> The tsfiles created are organized in the following directory structure:
> {code:java}
> |data
> |--sequence
> |----root.ln.wf01.wt01
> |------1575813520203-101-0.tsfile
> |------1575813520203-101-0.tsfile.resource
> |------1575813520669-103-0.tsfile
> |------1575813520669-103-0.tsfile.resource
> |--unsequence
> |----root.ln.wf01.wt01
> |------1575813521063-105-0.tsfile
> |------1575813521063-105-0.tsfile.resource
> {code}
> ||File Name||Data Time||
> |(a) 1575813520203-101-0.tsfile|1-5|
> |(c) 1575813521063-105-0.tsfile|10-50|
> |(b) 1575813520669-103-0.tsfile|100-300|
>
> Note how the order of fileName is inconsistent with that of dataTime.
>
> By the way, if you look into the code, you will know how the file name is
> generated:
> {code:java}
> System.currentTimeMillis() + IoTDBConstant.TSFILE_NAME_SEPARATOR +
> versionController.nextVersion() + IoTDBConstant.TSFILE_NAME_SEPARATOR + "0" +
> TSFILE_SUFFIX
> {code}
> ----
> Then, I loaded the three tsfiles into another brand new IoTDB. I did two
> experiments with different loading orders each.
>
> In the first experiment, the tsfiles were loaded in their data time order.
> That is,
> {code:java}
> IoTDB> load 1575813520203-101-0.tsfile // tsfile (a), with data time 1-5
> IoTDB> load 1575813521063-105-0.tsfile // tsfile (c), with data time 10-50
> IoTDB> load 1575813520669-103-0.tsfile // tsfile (b), with data time
> 100-300{code}
> After loading successfully, I did the following query in the same client
> window and got the wrong result:
> {code:java}
> IoTDB> select * from root
> +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+
> | Time|root.ln.wf01.wt01.temperature|
> root.ln.wf01.wt01.status| root.ln.wf01.wt01.hardware|
> +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+
> | 1970-01-01T08:00:00.001+08:00| 1.1|
> false| 11|
> | 1970-01-01T08:00:00.002+08:00| 2.2|
> true| 22|
> | 1970-01-01T08:00:00.003+08:00| 3.3|
> false| 33|
> | 1970-01-01T08:00:00.004+08:00| 4.4|
> false| 44|
> | 1970-01-01T08:00:00.005+08:00| 5.5|
> false| 55|
> | 1970-01-01T08:00:00.100+08:00| 100.1|
> false| 110|
> | 1970-01-01T08:00:00.150+08:00| 200.2|
> true| 220|
> | 1970-01-01T08:00:00.200+08:00| 300.3|
> false| 330|
> | 1970-01-01T08:00:00.250+08:00| 400.4|
> false| 440|
> | 1970-01-01T08:00:00.300+08:00| 500.5|
> false| 550|
> | 1970-01-01T08:00:00.010+08:00| 10.1|
> false| 110|
> | 1970-01-01T08:00:00.020+08:00| 20.2|
> true| 220|
> | 1970-01-01T08:00:00.030+08:00| 30.3|
> false| 330|
> | 1970-01-01T08:00:00.040+08:00| 40.4|
> false| 440|
> | 1970-01-01T08:00:00.050+08:00| 50.5|
> false| 550|
> +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+
> Total line number = 15
> It costs 0.198s
> {code}
> I checked the data directory of the loaded server and it looks like this:
> {code:java}
> |data
> |--sequence
> |----root.ln.wf01.wt01
> |------1575813520203-101-0.tsfile
> |------1575813520203-101-0.tsfile.resource
> |------1575813520669-103-0.tsfile
> |------1575813520669-103-0.tsfile.resource
> |------1575813521063-105-0.tsfile
> |------1575813521063-105-0.tsfile.resource
> |--unsequence{code}
> ----
> In the second experiment, the tsfiles were loaded in their file name order.
> That is,
> {code:java}
> IoTDB> load 1575813520203-101-0.tsfile // tsfile (a), with data time 1-5
> IoTDB> load 1575813520669-103-0.tsfile // tsfile (b), with data time 100-300
> IoTDB> load 1575813521063-105-0.tsfile // tsfile (c), with data time
> 10-50{code}
> Note that I was expected the tsfile (c) be loaded as into the unsequence data
> directory.
>
> After loading successfully, I did the following query in the same client
> window and got the CORRECT result:
> {code:java}
> IoTDB> select * from root
> +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+
> | Time|root.ln.wf01.wt01.temperature|
> root.ln.wf01.wt01.status| root.ln.wf01.wt01.hardware|
> +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+
> | 1970-01-01T08:00:00.001+08:00| 1.1|
> false| 11|
> | 1970-01-01T08:00:00.002+08:00| 2.2|
> true| 22|
> | 1970-01-01T08:00:00.003+08:00| 3.3|
> false| 33|
> | 1970-01-01T08:00:00.004+08:00| 4.4|
> false| 44|
> | 1970-01-01T08:00:00.005+08:00| 5.5|
> false| 55|
> | 1970-01-01T08:00:00.010+08:00| 10.1|
> false| 110|
> | 1970-01-01T08:00:00.020+08:00| 20.2|
> true| 220|
> | 1970-01-01T08:00:00.030+08:00| 30.3|
> false| 330|
> | 1970-01-01T08:00:00.040+08:00| 40.4|
> false| 440|
> | 1970-01-01T08:00:00.050+08:00| 50.5|
> false| 550|
> | 1970-01-01T08:00:00.100+08:00| 100.1|
> false| 110|
> | 1970-01-01T08:00:00.150+08:00| 200.2|
> true| 220|
> | 1970-01-01T08:00:00.200+08:00| 300.3|
> false| 330|
> | 1970-01-01T08:00:00.250+08:00| 400.4|
> false| 440|
> | 1970-01-01T08:00:00.300+08:00| 500.5|
> false| 550|
> +-----------------------------------+-----------------------------+-----------------------------+-----------------------------+
> Total line number = 15
> It costs 0.267s
> {code}
> I looked into the data directory of the loaded server and surprisingly it is
> the same as in the first experiment. Further in the second experiment, I
> restarted the server and the client, and queried again. This time, the result
> is wrong again as that of the first experiment.
>
> *There is a special confusing point of the second experiment*: why the tsfile
> (c) is not loaded as an unsequence tsfile? Why did the query executed
> immediately after the three tsfiles were loaded get the CORRECT result?
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)