This is the picture (bmp format) in 2.1.
------------------ 原始邮件 ------------------ 发件人: "suyue"<[email protected]>; 发送时间: 2019年6月24日(星期一) 晚上10:14 收件人: "dev"<[email protected]>; 主题: Re: Discussion: IoTDB Query on Value Columns This is the picture in 2.1. 在 2019年6月24日,下午9:58,RUI, LEI <[email protected]> 写道: 1. Problem Description Consider four data points (t,v) are written to IoTDB in the following order: (1,1) (2,2) (3,3) (1,100) Then, given a query “select * from root where v<10”, the expected result is (2,2)(3,3). This is because the later inserted data point (1,100) should cover the earlier inserted data point (1,1). However, we find that in IoTDB the queried result is (1,100),(2,2),(3,3). More details see JIRA-121. 2. IoTDB Background 2.1 data organization In IoTDB, the above data points will be divided into sequential data source and unsequential data source separately, as is shown below. 2.2 query process The execution process of sql “select * from root where v<10” is as follows: (1) Create a timeGenerator for the value filter “v<10”. It will return statisfied timestamps iteratively. (2) Fetch the value by the timestamp generated by the TimeGenerator. 3. Analysis 3.1 Annotation Description s: data source s1<s2: s2 has higher priority than s1, which means that data points in s2 always cover those of the same timestamps in s1. ss: sequential data source. us: unsequential data source. us>ss, i.e., unsequential data source always has higher priority than sequential data source. merge(s1,s2): union data points from s1 and s2. When two data points from s1 and s2 respectively have the same timestamp, keep the data point from the higher priority source. query(s): apply the query pushdown on the data source s and return the query result 3.2 Current Query Plan The current query plan in IoTDB goes like this: timeGenerator=merge(query(ss),query(us)) Explain using the above example: ss=((1,1),(2,2),(3,3)) us=(1,100) query(ss)=((1,1),(2,2),(3,3)) query(us)=ϕ timeGenerator=merge(query(ss),query(us))=((1,1),(2,2),(3,3)) Then fetch the value by the timestamp generated by the above timeGenerator. Note that in this step, we fetch value from merged data source, i.e., merge(ss,us). The final result is ((1,100),(2,2),(3,3)). This is how the bug comes from: there is no post-filter applied on the false positives in the timeGenerator. 3.3 Possibile Solutions We come up with several alternative solutions. (1) timeGenerator=query(merge(ss,us)) (2) timeGenerator=query(merge(query(ss),us)) (3) timeGenerator=query(merge(query(ss),query(us))) (1) is a simple solution. (2) and (3) have different advantages. (2): The query condition is pushed down to ss first and then applied to the merged result of query(ss) and us. When the selection query (corresponding to timeGenerator) and the projection query have the same series in common, we can use values of those series cached in timeGenerator to speed up the projection process. (3): The query condition is pushed down to the unsequential data source too. Thus, data not satisfying the query condition can be filtered out at an early stage. 3.4 Discussion Does anyone know of any mature solutions in other systems? Or which solution do you think is better, (2) or (3)? Looking forward to your advice. Sincerely, Lei Rui, Yue Su
