[Impala-ASF-CR] IMPALA-3063: Separate join inversion from join ordering.

Alex Behm (Code Review) Sun, 14 Aug 2016 23:45:07 -0700

Alex Behm has posted comments on this change.

Change subject: IMPALA-3063: Separate join inversion from join ordering.
......................................................................

Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3846/2//COMMIT_MSG
Commit Message:

Line 7: IMPALA-3063: Separate join inversion from join ordering.
> the changes in test output make it look like some plans also change (so thi
Correct. Let me explain why it is not only a separation of functionality.

1. Our join cardinality estimation is not symmetric, i.e., A JOIN B may not
give the same estimate as B JOIN A due to our FK/PK detection heuristic. In the
context of this patch this means that an inverted join may have a different
cardinality estimate, so plans may change depending on whether the inversion is
done during join ordering of after.

2. We currently only invert outer/semi/anti joins based on the rhs table ref
join op. In this patch I want to preserve the existing behavior as much as
possible, but when doing the join ordering in a separate pass we may see a join
opn in a JoinNode that is different from the rhs table ref. So in some
situations the inversion behavior based on the join op could be different and
there are some examples in this patch.

It's hard to say which method produces better plans, but the post-ordering join
inversion seems simpler and easier to reason about. Maybe you have other
thoughts?

http://gerrit.cloudera.org:8080/#/c/3846/2/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
File testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test:

Line 814: | |--07:HASH JOIN [RIGHT OUTER JOIN]
> why is this cross product flipped?
Good catch. The cardinality estimates are clear: Join 08 should indeed be
inverted.

However, I'm trying to minimize the behavioral changes in this patch and I
missed the case of non-equi inner joins which we should not invert if we want
to preserve the existing behavior.

http://gerrit.cloudera.org:8080/#/c/3846/2/testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test:

Line 1088: | hash predicates: l_suppkey = s_suppkey, l_partkey = ps_partkey
> we've seen meaningless changes like this before. i'm wondering whether we s
We do order them based on cost, but we don't have a tie breaker to get an
absolute order.

I tried using toSql() as a tie breaker which is good enough for the explain use
case (still not an absolute order).
Unfortunately, that changes a ton of plans, so I'd prefer to do that in a
follow-on patch. I believe we even have a JIRA for that. Sound good?

Line 1777: | hash predicates: p_partkey = ps_partkey
> is this the right thing to do, given selectivities?
Another good catch. The change was only coincidentally caused by this patch.
There was an independent bug in estimating cross join cardinalities which lead
us to compute different cardinalities for:

NESTED_LOOP_JOIN
UNNEST
SINGULAR_ROW_SRC

versus.

NESTED_LOOP_JOIN
SINGULAR_ROW_SRC
UNNEST

That difference caused a plan change here because during join ordering we had a
different version than before.

--
To view, visit http://gerrit.cloudera.org:8080/3846
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If86db7753fc585bb4c69612745ec0103278888a4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3063: Separate join inversion from join ordering.

Reply via email to