[jira] [Commented] (PHOENIX-1556) Base hash versus sort merge join decision on cost

James Taylor (JIRA) Fri, 09 Feb 2018 11:36:19 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358851#comment-16358851
 ]


James Taylor commented on PHOENIX-1556:
---------------------------------------

Wow, this is really awesome, [~maryannxue]. I love the tests. A couple of 
questions:
- Should UNION_DISTINCT_FACTOR be 1.0 since we only support UNION ALL currently?
{code}
+        if (!all) {
+            rows *= UNION_DISTINCT_FACTOR;
+        }
{code}
- What's the reasoning behind stripSkipScanFilter? Is that removed because it's 
effect is already incorporated into the bytes scanned estimate?
- Should RowCountVisitor have a method for distinct? In particular, there's an 
optimization we have when doing a distinct on the leading PK columns which 
impacts cost. This optimization is not identified until runtime, so we might 
need to tweak the code so we know about it at compile time. This could be done 
in a separate patch.
- Somewhat orthogonal to your pull (but maybe building on top of it), do you 
think it'd be possible to prevent a query from running that's "too expensive" 
(assuming "too expensive" would be identified by a config property)? Something 
to keep in mind - I can file a separate JIRA for this.

> Base hash versus sort merge join decision on cost
> -------------------------------------------------
>
>                 Key: PHOENIX-1556
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1556
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>            Priority: Major
>              Labels: CostBasedOptimization
>         Attachments: PHOENIX-1556.patch
>
>
> At compile time, we know how many guideposts (i.e. how many bytes) will be 
> scanned for the RHS table. We should, by default, base the decision of using 
> the hash-join verus many-to-many join on this information.
> Another criteria (as we've seen in PHOENIX-4508) is whether or not the tables 
> being joined are already ordered by the join key. In that case, it's better 
> to always use the sort merge join.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-1556) Base hash versus sort merge join decision on cost

Reply via email to