[jira] [Comment Edited] (ASTERIXDB-1779) Processing the certain function predicates after a simple predicates

Taewoo Kim (JIRA) Wed, 01 Feb 2017 11:07:34 -0800

    [ 
https://issues.apache.org/jira/browse/ASTERIXDB-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848791#comment-15848791
 ]


Taewoo Kim edited comment on ASTERIXDB-1779 at 2/1/17 7:05 PM:
---------------------------------------------------------------

Certain functions - especially text functions and spatial functions are 
expensive than simple comparison functions.  So, based on this, I think we can 
slightly change the order of predicates. And the interesting point is that the 
original orders are not preserved anyway in the current codebase. The point is 
that I would like to postpone expensive function evaluations to the end.

An example of the optimization in the master branch as of now
{code}
let $ts := datetime("2010-12-12T00:00:00Z")
let $region := create-rectangle(create-point(0.0,0.0),create-point(100.0,100.0))
let $keyword := "verizon"
for $t in dataset TweetMessages
where $t.send-time > $ts
    and spatial-intersect($t.user.sender-location, $region)
    and contains($t.message-text, $keyword)
return $t
{code}

Final Plan
{code}
distribute result [%0->$$3]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    project ([$$3])
    -- STREAM_PROJECT  |PARTITIONED|
      select (function-call: algebricks:and, Args:[function-call: 
asterix:spatial-intersect, Args:[function-call: asterix:field-access-by-index, 
Args:[%0->$$21, AInt32: {6}], ARectangle: { p1: APoint: { x: 0.0, y: 0.0 }, p2: 
APoint: { x: 100.0, y: 100.0 }}], function-call: asterix:contains, 
Args:[function-call: asterix:field-access-by-index, Args:[%0->$$3, AInt32: 
{4}], AString: {verizon}], function-call: algebricks:gt, Args:[function-call: 
asterix:field-access-by-index, Args:[%0->$$3, AInt32: {2}], ADateTime: { 
2010-12-12T00:00:00.000Z }]])
      -- STREAM_SELECT  |PARTITIONED|
        assign [$$21] <- [function-call: asterix:field-access-by-index, 
Args:[%0->$$3, AInt32: {1}]]
        -- ASSIGN  |PARTITIONED|
          project ([$$3])
          -- STREAM_PROJECT  |PARTITIONED|
            exchange
            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
              data-scan []<-[$$17, $$3] <- TinySocial:TweetMessages
              -- DATASOURCE_SCAN  |PARTITIONED|
                exchange
                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                  empty-tuple-source
                  -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
{code}


was (Author: wangsaeu):
Certain functions - especially text functions and spatial functions are 
expensive than simple comparison functions.  So, based on this, I think we can 
slightly change the order of predicates. And the interesting point is that the 
original orders are not preserved anyway in the current codebase. The point is 
that I would like to postpone expensive function evaluations to the end.

{code}
let $ts := datetime("2010-12-12T00:00:00Z")
let $region := create-rectangle(create-point(0.0,0.0),create-point(100.0,100.0))
let $keyword := "verizon"
for $t in dataset TweetMessages
where $t.send-time > $ts
    and spatial-intersect($t.user.sender-location, $region)
    and contains($t.message-text, $keyword)
return $t
{code}

Final Plan
{code}
distribute result [%0->$$3]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    project ([$$3])
    -- STREAM_PROJECT  |PARTITIONED|
      select (function-call: algebricks:and, Args:[function-call: 
asterix:spatial-intersect, Args:[function-call: asterix:field-access-by-index, 
Args:[%0->$$21, AInt32: {6}], ARectangle: { p1: APoint: { x: 0.0, y: 0.0 }, p2: 
APoint: { x: 100.0, y: 100.0 }}], function-call: asterix:contains, 
Args:[function-call: asterix:field-access-by-index, Args:[%0->$$3, AInt32: 
{4}], AString: {verizon}], function-call: algebricks:gt, Args:[function-call: 
asterix:field-access-by-index, Args:[%0->$$3, AInt32: {2}], ADateTime: { 
2010-12-12T00:00:00.000Z }]])
      -- STREAM_SELECT  |PARTITIONED|
        assign [$$21] <- [function-call: asterix:field-access-by-index, 
Args:[%0->$$3, AInt32: {1}]]
        -- ASSIGN  |PARTITIONED|
          project ([$$3])
          -- STREAM_PROJECT  |PARTITIONED|
            exchange
            -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
              data-scan []<-[$$17, $$3] <- TinySocial:TweetMessages
              -- DATASOURCE_SCAN  |PARTITIONED|
                exchange
                -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
                  empty-tuple-source
                  -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
{code}

> Processing the certain function predicates after a simple predicates
> --------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1779
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1779
>             Project: Apache AsterixDB
>          Issue Type: Improvement
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>
> For example, if we have the following AQL query,
> {code}
> for $i in dataset MyData
>    where $i.id < 5 and edit-distance($i.name, "Arnold") < 2
>    return $i;
> {code}
> It may be better to process *$i.id < 5* predicate first and then process 
> *edit-distance($i.name, "Arnold")* predicate since the processing cost of the 
> latter is higher than that of the former.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (ASTERIXDB-1779) Processing the certain function predicates after a simple predicates

Reply via email to