[
https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275501#comment-17275501
]
Jialin Qiao commented on IOTDB-1140:
------------------------------------
Hi, in IoTDB, a sequence of time series will be sorted, and then encoding.
So, out-of-order will not happen, it will be 1000,1100,1400,1500,1800。
Instead of this case, more complicated is: 1000, 1201, 1299, 1303
Data is not in fixed frequency.
Maybe we could check the data, and using TS2_DIFF if it is not in fixed
frequency?
> optimize regular data encoding
> ------------------------------
>
> Key: IOTDB-1140
> URL: https://issues.apache.org/jira/browse/IOTDB-1140
> Project: Apache IoTDB
> Issue Type: Improvement
> Components: Core/Engine
> Reporter: Chao Wang
> Assignee: Chao Wang
> Priority: Critical
>
> current regular data encoding algorithm:
> # Calculate the difference between two adjacent values. The smallest
> difference is used as the equal-frequency frequency.
> # Determine the data range of this batch of data based on the difference
> between the last value and the first value.
> # Traverse this batch of data, use a BitSet, compare the difference between
> two adjacent values with the same frequency, and save the value true by
> default,
> If the value is not equal to the equal frequency, calculate the number of
> equal frequency differences and set the value to false at the corresponding
> position, indicating that the point is a missing point.
>
> this algorithm only can identity missing point, if have error point , it
> will throw exception..
> because BitSet only can do this thing, indicates whether the same frequency
> exists in a segment of data
>
> But there is some optimize point..
> If there is an abnormal value in a column of values, the algorithm is
> deviated if the difference is directly obtained to the minimum value.
> sample: 1000,1100,1800,1400,1500...
> current algorithm be do not use...
> 1800 is a error point, we should identity error point, revise data.
> revise data should be : 1000,1100,1300,1400,1500
--
This message was sent by Atlassian Jira
(v8.3.4#803005)