This is the picture (bmp format) in 2.1.





------------------ 原始邮件 ------------------
发件人: "suyue"<[email protected]>;
发送时间: 2019年6月24日(星期一) 晚上10:14
收件人: "dev"<[email protected]>;

主题: Re: Discussion: IoTDB Query on Value Columns



This is the picture in 2.1.
 

在 2019年6月24日,下午9:58,RUI, LEI <[email protected]> 写道:


1. Problem Description

Consider four data points (t,v) are written to IoTDB in the following order:

(1,1)

(2,2)

(3,3)

(1,100)

Then, given a query “select * from root where v<10”, the expected result is 
(2,2)(3,3). This is because the later inserted data point (1,100) should cover 
the earlier inserted data point (1,1). 

However, we find that in IoTDB the queried result is (1,100),(2,2),(3,3).

More details see JIRA-121.




2. IoTDB Background

2.1 data organization

In IoTDB, the above data points will be divided into sequential data source and 
unsequential data source separately, as is shown below.



2.2 query process

The execution process of sql “select * from root where v<10” is as follows:

(1) Create a timeGenerator for the value filter “v<10”. It will return 
statisfied timestamps iteratively.

(2) Fetch the value by the timestamp generated by the TimeGenerator.

 

3. Analysis

3.1 Annotation Description
 
s: data source​

s1<s2: s2 has higher priority than s1, which means that data points in s2 
always cover those of the same timestamps in s1.

ss: sequential data source. 

us: unsequential data source. us>ss, i.e., unsequential data source always has 
higher priority than sequential data source.

merge(s1,s2): union data points from s1 and s2. When two data points from s1 
and s2 respectively have the same timestamp, keep the data point from the 
higher priority source.

query(s): apply the query pushdown on the data source s and return the query 
result 

 

3.2 Current Query Plan

       The current query plan in IoTDB goes like this: 
timeGenerator=merge(query(ss),query(us))



       Explain using the above example:

ss=((1,1),(2,2),(3,3))

us=(1,100)

query(ss)=((1,1),(2,2),(3,3))

query(us)=ϕ

timeGenerator=merge(query(ss),query(us))=((1,1),(2,2),(3,3))







       Then fetch the value by the timestamp generated by the above 
timeGenerator. Note that in this step, we fetch value from merged data source, 
i.e., merge(ss,us). The final result is ((1,100),(2,2),(3,3)). This is how the 
bug comes from: there is no post-filter applied on the false positives in the 
timeGenerator.

 

3.3 Possibile Solutions

We come up with several alternative solutions.

(1) timeGenerator=query(merge(ss,us))

(2) timeGenerator=query(merge(query(ss),us))

(3) timeGenerator=query(merge(query(ss),query(us)))



(1) is a simple solution. 

(2) and (3) have different advantages. 

(2): The query condition is pushed down to ss first and then applied to the 
merged result of query(ss) and us. When the selection query (corresponding to 
timeGenerator) and the projection query have the same series in common, we can 
use values of those series cached in timeGenerator to speed up the projection 
process.

(3): The query condition is pushed down to the unsequential data source too. 
Thus, data not satisfying the query condition can be filtered out at an early 
stage.




3.4 Discussion

       Does anyone know of any mature solutions in other systems? Or which 
solution do you think is better, (2) or (3)?

       Looking forward to your advice.




Sincerely,

Lei Rui, Yue Su

Reply via email to