[
https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275515#comment-17275515
]
Jialin Qiao commented on IOTDB-1140:
------------------------------------
Oh, the regular method is designed to encode the time column. Maybe we need to
restrict the usage of this encoding to timestamp.
Eliminate the wrong value is ok, should we abort the timestamp at the same time?
> optimize regular data encoding
> ------------------------------
>
> Key: IOTDB-1140
> URL: https://issues.apache.org/jira/browse/IOTDB-1140
> Project: Apache IoTDB
> Issue Type: Improvement
> Components: Core/Engine
> Reporter: Chao Wang
> Assignee: Chao Wang
> Priority: Critical
>
> current regular data encoding algorithm:
> # Calculate the difference between two adjacent values. The smallest
> difference is used as the equal-frequency frequency.
> # Determine the data range of this batch of data based on the difference
> between the last value and the first value.
> # Traverse this batch of data, use a BitSet, compare the difference between
> two adjacent values with the same frequency, and save the value true by
> default,
> If the value is not equal to the equal frequency, calculate the number of
> equal frequency differences and set the value to false at the corresponding
> position, indicating that the point is a missing point.
>
> this algorithm only can identity missing point, if have error point , it
> will throw exception..
> because BitSet only can do this thing, indicates whether the same frequency
> exists in a segment of data
>
> But there is some optimize point..
> If there is an abnormal value in a column of values, the algorithm is
> deviated if the difference is directly obtained to the minimum value.
> sample: 1000,1100,1800,1400,1500...
> current algorithm be do not use...
> 1800 is a error point, we should identity error point, revise data.
> revise data should be : 1000,1100,1300,1400,1500
--
This message was sent by Atlassian Jira
(v8.3.4#803005)