[jira] [Commented] (IOTDB-1140) optimize regular data encoding

Chao Wang (Jira) Fri, 29 Jan 2021 23:04:06 -0800


    [ 
https://issues.apache.org/jira/browse/IOTDB-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275509#comment-17275509
 ]


Chao Wang commented on IOTDB-1140:
----------------------------------

thanks [~qiaojialin] , 

 value is not sort..

sample:

create timeseries root.db_0.tab0.salary with datatype=INT64,encoding=REGULAR ;

(time, salary) == > (1, 1000) ,(2, 1100),(3, 1800)(4, 1400),(5,1500)

only time series will be sorted by time,  not salary. 

 

> optimize regular data encoding
> ------------------------------
>
>                 Key: IOTDB-1140
>                 URL: https://issues.apache.org/jira/browse/IOTDB-1140
>             Project: Apache IoTDB
>          Issue Type: Improvement
>          Components: Core/Engine
>            Reporter: Chao Wang
>            Assignee: Chao Wang
>            Priority: Critical
>
> current regular data encoding algorithm：
>  # Calculate the difference between two adjacent values. The smallest 
> difference is used as the equal-frequency frequency.
>  # Determine the data range of this batch of data based on the difference 
> between the last value and the first value.
>  # Traverse this batch of data, use a BitSet, compare the difference between 
> two adjacent values with the same frequency, and save the value true by 
> default,
>  If the value is not equal to the equal frequency, calculate the number of 
> equal frequency differences and set the value to false at the corresponding 
> position, indicating that the point is a missing point.
>  
> this algorithm only can identity missing point,  if have error point , it 
> will throw exception..
> because BitSet only can do this thing,  indicates whether the same frequency 
> exists in a segment of data
>  
> But there is some optimize point..
> If there is an abnormal value in a column of values, the algorithm is 
> deviated if the difference is directly obtained to the minimum value.
> sample: 1000，1100，1800，1400，1500... 
> current algorithm be do not use...
> 1800 is a error point,  we should identity error point,  revise data. 
> revise data should be : 1000，1100，1300，1400，1500



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (IOTDB-1140) optimize regular data encoding

Reply via email to