[
https://issues.apache.org/jira/browse/ASTERIXDB-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848791#comment-15848791
]
Taewoo Kim edited comment on ASTERIXDB-1779 at 2/1/17 7:05 PM:
---------------------------------------------------------------
Certain functions - especially text functions and spatial functions are
expensive than simple comparison functions. So, based on this, I think we can
slightly change the order of predicates. And the interesting point is that the
original orders are not preserved anyway in the current codebase. The point is
that I would like to postpone expensive function evaluations to the end.
An example of the optimization in the master branch as of now
{code}
let $ts := datetime("2010-12-12T00:00:00Z")
let $region := create-rectangle(create-point(0.0,0.0),create-point(100.0,100.0))
let $keyword := "verizon"
for $t in dataset TweetMessages
where $t.send-time > $ts
and spatial-intersect($t.user.sender-location, $region)
and contains($t.message-text, $keyword)
return $t
{code}
Final Plan
{code}
distribute result [%0->$$3]
-- DISTRIBUTE_RESULT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$3])
-- STREAM_PROJECT |PARTITIONED|
select (function-call: algebricks:and, Args:[function-call:
asterix:spatial-intersect, Args:[function-call: asterix:field-access-by-index,
Args:[%0->$$21, AInt32: {6}], ARectangle: { p1: APoint: { x: 0.0, y: 0.0 }, p2:
APoint: { x: 100.0, y: 100.0 }}], function-call: asterix:contains,
Args:[function-call: asterix:field-access-by-index, Args:[%0->$$3, AInt32:
{4}], AString: {verizon}], function-call: algebricks:gt, Args:[function-call:
asterix:field-access-by-index, Args:[%0->$$3, AInt32: {2}], ADateTime: {
2010-12-12T00:00:00.000Z }]])
-- STREAM_SELECT |PARTITIONED|
assign [$$21] <- [function-call: asterix:field-access-by-index,
Args:[%0->$$3, AInt32: {1}]]
-- ASSIGN |PARTITIONED|
project ([$$3])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<-[$$17, $$3] <- TinySocial:TweetMessages
-- DATASOURCE_SCAN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
{code}
was (Author: wangsaeu):
Certain functions - especially text functions and spatial functions are
expensive than simple comparison functions. So, based on this, I think we can
slightly change the order of predicates. And the interesting point is that the
original orders are not preserved anyway in the current codebase. The point is
that I would like to postpone expensive function evaluations to the end.
{code}
let $ts := datetime("2010-12-12T00:00:00Z")
let $region := create-rectangle(create-point(0.0,0.0),create-point(100.0,100.0))
let $keyword := "verizon"
for $t in dataset TweetMessages
where $t.send-time > $ts
and spatial-intersect($t.user.sender-location, $region)
and contains($t.message-text, $keyword)
return $t
{code}
Final Plan
{code}
distribute result [%0->$$3]
-- DISTRIBUTE_RESULT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
project ([$$3])
-- STREAM_PROJECT |PARTITIONED|
select (function-call: algebricks:and, Args:[function-call:
asterix:spatial-intersect, Args:[function-call: asterix:field-access-by-index,
Args:[%0->$$21, AInt32: {6}], ARectangle: { p1: APoint: { x: 0.0, y: 0.0 }, p2:
APoint: { x: 100.0, y: 100.0 }}], function-call: asterix:contains,
Args:[function-call: asterix:field-access-by-index, Args:[%0->$$3, AInt32:
{4}], AString: {verizon}], function-call: algebricks:gt, Args:[function-call:
asterix:field-access-by-index, Args:[%0->$$3, AInt32: {2}], ADateTime: {
2010-12-12T00:00:00.000Z }]])
-- STREAM_SELECT |PARTITIONED|
assign [$$21] <- [function-call: asterix:field-access-by-index,
Args:[%0->$$3, AInt32: {1}]]
-- ASSIGN |PARTITIONED|
project ([$$3])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<-[$$17, $$3] <- TinySocial:TweetMessages
-- DATASOURCE_SCAN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|
{code}
> Processing the certain function predicates after a simple predicates
> --------------------------------------------------------------------
>
> Key: ASTERIXDB-1779
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1779
> Project: Apache AsterixDB
> Issue Type: Improvement
> Reporter: Taewoo Kim
> Assignee: Taewoo Kim
>
> For example, if we have the following AQL query,
> {code}
> for $i in dataset MyData
> where $i.id < 5 and edit-distance($i.name, "Arnold") < 2
> return $i;
> {code}
> It may be better to process *$i.id < 5* predicate first and then process
> *edit-distance($i.name, "Arnold")* predicate since the processing cost of the
> latter is higher than that of the former.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)