This is an automated email from the ASF dual-hosted git repository. maxyang pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/cloudberry.git
commit 50ad35a453f648fac5cf8e22277c81cd7d7af9cb Author: Dev Swaroop Chattopadhyay <[email protected]> AuthorDate: Fri Feb 2 23:49:12 2024 +0530 Fix for finding child output columns when parent is union while join pruning (#16960) Fix issue #16932 ## Terminology used: `Derived Output columns:` Output columns defined by each operator `Output Columns:` The intersection of the operator's derived output columns and the operator's parent's combined output pred columns. If the operator is the root operator, its output columns would be the intersection of the operator's derived output columns and the query's output columns. `Combined Output Pred Columns:` A set containing operator's output columns and its predicate's (if present) derived output columns ## Root Cause Analysis ------------------------- Consider the below tree ``` +--A(Logical) |--B (Logical) | |----C (Logical) | |----D (Logical) | +----E (Scalar) | +--F (Scalar) ``` In order to perform left join pruning in ORCA, one of the step is that each operator(Logical) determines two sets. One set containing output columns of the operator and second set containing combined output pred columns: For e.g. in the above tree, operator A will create two sets. One set containing output columns of A (let's call this set S1) and second set containing combined output pred columns (let's call this set S2). In order to determine those two sets the below algorithm is employed (Suppose we are determining for operator A) 1.) The operator A will find it's derived output columns and will take a intersection with the query's output. The common values will be inserted into set S1 and S2. The set S1 now contains the output columns of operator A. 2.)If a scalar operator is present for operator A then the derived output columns of the scalar operator are also inserted into the set S2 (In this case operator F's derived output columns) 3.) Now the set S2 will contain combined output pred columns for operator A. 4.) This set S2 is passed to the next function call in which the output columns of operator B will be determined. Now consider the below query: **Setup:** ``` create table t1(a int primary key, b int) ; create table t2(a int primary key, b int) ; ``` **Query:** ``` explain select t2.b from t1 left join t2 on t1.a = t2.a union all select t2.b from t1 left join t2 on t1.a = t2.a; ``` **Algebrized tree:** ``` +--CLogicalUnionAll Output: ("b" (10)), Input: [("b" (10)), ("b" (28))] |--CLogicalLeftOuterJoin | |--CLogicalGet "t1" ("t1"), Columns: ["a" (0), "b" (1)] | |--CLogicalGet "t2" ("t2"), Columns: ["a" (9), "b" (10)] | +--CScalarCmp (=) | |--CScalarIdent "a" (0) | +--CScalarIdent "a" (9) +--CLogicalLeftOuterJoin |--CLogicalGet "t1" ("t1"), Columns: ["a" (18), "b" (19)] |--CLogicalGet "t2" ("t2"), Columns: ["a" (27), "b" (28)] +--CScalarCmp (=) |--CScalarIdent "a" (18) +--CScalarIdent "a" (27) ``` Based on the above algorithm mentioned The derived output columns of CLogicalSetOp (CLogicalUnion/UnionAll/Intersect/IntersectAll/Except/ExceptAll) are from its outer child. Based on the algorithm mentioned above, CLogicalUnionAll will create two sets. One set containing output columns of itself and second set containing combined output pred columns . The first set will contain b(10) and the second set will also contain b(10). This second set will be passed to the outer and inner CLogicalLeftOuterJoin operator for determining there output columns. The outer CLogicalLeftOuterJoin will find its derived output columns and will intersect it with the set passed by CLogicalUnionAll containing b(10). After intersection we will get b(10) as the output column for outer CLogicalLeftOuterJoin. Similarly the inner CLogicalLeftOuterJoin will find its derived output columns and will intersect it with the set passed by CLogicalUnionAll containing b(10). But here the intersection will lead to a empty output column set for inner CLogicalLeftOuterJoin. Now for pruning a left join in a query, one of the condition that ORCA checks is that all output columns of CLogicalLeftOuterJoin should only belong from the outer relation. The method performing this check doesn't handle the case when the output columns set is empty. Due to this all the checks pass in that method and ORCA proceeds to prune the Join. So basically the issue is that ORCA is not able to properly determine the output columns for the child operators when the parent is CLogicalSetOp which is leading to unwanted pruning of joins. A similar issue was fixed in PR #16690 which ensured that the combined output pred columns of CLogicalSetOp is not empty because an empty set was leading to unwanted pruning of left join. But that fix was not complete. It was incorrectly determining combined output pred columns of CLogicalSetOp which lead to #16932. ## Fix -------- Updating the logic for determining the combined output pred columns of CLogicalSetOp. Since in the case of CLogicalSetOp, the output columns are from the outer child, the inner child will always have empty output columns which is not completely correct. So we need to also include the inner child columns in the combined output pred columns for CLogicalSetOP. --- .abi-check/7.1.0/postgres.symbols.ignore | 1 + .abi-check/7.1.0/postgres.types.ignore | 2 + .../src/operators/CLeftJoinPruningPreprocessor.cpp | 73 ++- src/test/regress/expected/join_gp.out | 473 +++++++++++++++++- src/test/regress/expected/join_gp_optimizer.out | 537 ++++++++++++++++++++- src/test/regress/sql/join_gp.sql | 166 ++++++- 6 files changed, 1195 insertions(+), 57 deletions(-) diff --git a/.abi-check/7.1.0/postgres.symbols.ignore b/.abi-check/7.1.0/postgres.symbols.ignore new file mode 100644 index 0000000000..848dbf2841 --- /dev/null +++ b/.abi-check/7.1.0/postgres.symbols.ignore @@ -0,0 +1 @@ +pgarch_start diff --git a/.abi-check/7.1.0/postgres.types.ignore b/.abi-check/7.1.0/postgres.types.ignore new file mode 100644 index 0000000000..c4ed0044a9 --- /dev/null +++ b/.abi-check/7.1.0/postgres.types.ignore @@ -0,0 +1,2 @@ +enum PMSignalReason +enum AuxProcType diff --git a/src/backend/gporca/libgpopt/src/operators/CLeftJoinPruningPreprocessor.cpp b/src/backend/gporca/libgpopt/src/operators/CLeftJoinPruningPreprocessor.cpp index 8668b0f833..4d48c66c66 100644 --- a/src/backend/gporca/libgpopt/src/operators/CLeftJoinPruningPreprocessor.cpp +++ b/src/backend/gporca/libgpopt/src/operators/CLeftJoinPruningPreprocessor.cpp @@ -12,6 +12,7 @@ #include "gpopt/operators/CLeftJoinPruningPreprocessor.h" #include "gpopt/base/CColRefSetIter.h" +#include "gpopt/operators/CLogicalSetOp.h" #include "gpopt/operators/CPredicateUtils.h" #include "gpopt/operators/CScalarSubquery.h" #include "gpopt/operators/CScalarSubqueryQuantified.h" @@ -95,6 +96,13 @@ CLeftJoinPruningPreprocessor::CheckJoinPruningCondOnInnerRel( GPOS_ASSERT(nullptr != pexprNew); GPOS_ASSERT(nullptr != output_columns); + // if the output_columns is empty for a CLogicalLeftOuterJoin, then don't + // prune the join + if (0 == output_columns->Size()) + { + return false; + } + CExpression *inner_rel = (*pexprNew)[1]; const CColRefSet *derive_output_columns_inner_rel = inner_rel->DeriveOutputColumns(); @@ -397,13 +405,13 @@ CLeftJoinPruningPreprocessor::PexprJoinPruningScalarSubquery( void CLeftJoinPruningPreprocessor::ComputeOutputColumns( const CExpression *pexpr, const CColRefSet *derived_output_columns, - CColRefSet *output_columns, CColRefSet *childs_output_columns, + CColRefSet *output_columns, CColRefSet *combined_output_pred_columns, const CColRefSet *pcrsOutput) { GPOS_ASSERT(nullptr != pexpr); GPOS_ASSERT(nullptr != derived_output_columns); GPOS_ASSERT(nullptr != output_columns); - GPOS_ASSERT(nullptr != childs_output_columns); + GPOS_ASSERT(nullptr != combined_output_pred_columns); // Computing output columns of the parent CColRefSetIter iter_derived_output_columns(*derived_output_columns); @@ -417,7 +425,48 @@ CLeftJoinPruningPreprocessor::ComputeOutputColumns( } // Computing output columns of the child tree - childs_output_columns->Include(output_columns); + combined_output_pred_columns->Include(output_columns); + + // Consider the below algebrized tree + // CLogicalUnionAll Output: ("b" (10)), Input: [("b" (10)), ("b" (28))] + // |--CLogicalLeftOuterJoin + // | |--CLogicalGet "t1" ("t1"), Columns: ["a" (0), "b" (1)] + // | |--CLogicalGet "t2" ("t2"), Columns: ["a" (9), "b" (10)] + // | +--CScalarCmp (=) + // | |--CScalarIdent "a" (0) + // | +--CScalarIdent "a" (9) + // +--CLogicalLeftOuterJoin + // |--CLogicalGet "t1" ("t1"), Columns: ["a" (18), "b" (19)] + // |--CLogicalGet "t2" ("t2"), Columns: ["a" (27), "b" (28)] + // +--CScalarCmp (=) + // |--CScalarIdent "a" (18) + // +--CScalarIdent "a" (27) + // In this case the output_columns of CLogicalUnionAll will only contain + // b(10). In the case of CLogicalUnion or CLogicalUnionAll , the output + // columns are from the outer child. Due to this the output_columns of inner + // CLogicalLeftOuterJoin will be empty. So join pruning will not be possible + // in such cases. The Input in CLogicalUnion/CLogicalUnionAll signifies the + // columns from the childs. So including those columns in the + // combined_output_pred_columns of CLogicalUnionAll to perform pruning of + // the inner child if possible. Similar is the case for + // Intersect/IntersectAll and Difference/DifferenceAll. + + if (CUtils::FLogicalSetOp(pexpr->Pop())) + { + CLogicalSetOp *pop = CLogicalSetOp::PopConvert(pexpr->Pop()); + // Starting from index 1, since the 0th input column array is the same + // as the output column array + for (ULONG i = 1; i < pop->PdrgpdrgpcrInput()->Size(); i++) + { + CColRefArray *pdrgpcrInput = (*pop->PdrgpdrgpcrInput())[i]; + for (ULONG j = 0; j < pdrgpcrInput->Size(); j++) + { + combined_output_pred_columns->Include((*pdrgpcrInput)[j]); + } + } + } + + ULONG arity = pexpr->Arity(); for (ULONG ul = 0; ul < arity; ul++) { @@ -426,7 +475,7 @@ CLeftJoinPruningPreprocessor::ComputeOutputColumns( { CColRefSet *derived_used_columns_scalar = pexpr_child->DeriveUsedColumns(); - childs_output_columns->Include(derived_used_columns_scalar); + combined_output_pred_columns->Include(derived_used_columns_scalar); } } } @@ -434,7 +483,7 @@ CLeftJoinPruningPreprocessor::ComputeOutputColumns( CExpression * CLeftJoinPruningPreprocessor::JoinPruningTreeTraversal( CMemoryPool *mp, const CExpression *pexpr, CExpressionArray *pdrgpexpr, - const CColRefSet *childs_output_columns) + const CColRefSet *combined_output_pred_columns) { GPOS_ASSERT(nullptr != pexpr); GPOS_ASSERT(nullptr != pdrgpexpr); @@ -446,7 +495,7 @@ CLeftJoinPruningPreprocessor::JoinPruningTreeTraversal( if (pexpr_child->Pop()->FLogical()) { CExpression *pexprLogicalJoinPrunedChild = - PexprPreprocess(mp, pexpr_child, childs_output_columns); + PexprPreprocess(mp, pexpr_child, combined_output_pred_columns); pdrgpexpr->Append(pexprLogicalJoinPrunedChild); } else if (pexpr_child->DeriveHasSubquery()) @@ -543,20 +592,20 @@ CLeftJoinPruningPreprocessor::PexprPreprocess(CMemoryPool *mp, CColRefSet *output_columns = GPOS_NEW(mp) CColRefSet(mp); // Columns which are output by the child tree of the current expression - CColRefSet *childs_output_columns = GPOS_NEW(mp) CColRefSet(mp); + CColRefSet *combined_output_pred_columns = GPOS_NEW(mp) CColRefSet(mp); // Computing output columns of the current expression (output_columns) - // and the output columns of child tree (childs_output_columns) + // and the output columns of child tree (combined_output_pred_columns) ComputeOutputColumns(pexpr, derived_output_columns, output_columns, - childs_output_columns, pcrsOutput); + combined_output_pred_columns, pcrsOutput); // Array to hold the child expressions CExpressionArray *pdrgpexpr = GPOS_NEW(mp) CExpressionArray(mp); // Traversing the tree - CExpression *pexprNew = - JoinPruningTreeTraversal(mp, pexpr, pdrgpexpr, childs_output_columns); - childs_output_columns->Release(); + CExpression *pexprNew = JoinPruningTreeTraversal( + mp, pexpr, pdrgpexpr, combined_output_pred_columns); + combined_output_pred_columns->Release(); // Checking if the join is a left join then pruning the join if possible if (CPredicateUtils::FLeftOuterJoin(pexprNew)) diff --git a/src/test/regress/expected/join_gp.out b/src/test/regress/expected/join_gp.out index d03be4be93..52adf1eaa0 100644 --- a/src/test/regress/expected/join_gp.out +++ b/src/test/regress/expected/join_gp.out @@ -2081,32 +2081,467 @@ drop table barJoinPruning; -- -- Cases where join under union -- -create table foo(a int primary key, b int); -create table bar(a int primary key, b int); -explain select foo.a, bar.b from foo left join bar on foo.a = bar.a - union - select foo.a, bar.b from foo join bar on foo.a = bar.a; - QUERY PLAN -------------------------------------------------------------------------------------------------------- - Gather Motion 3:1 (slice1; segments: 3) (cost=5022.18..7892.18 rows=172200 width=8) - -> HashAggregate (cost=5022.18..5596.18 rows=57400 width=8) +create table foo(a int primary key, b int,c int); +create table bar(a int primary key, b int,c int); +insert into foo values (1,1,10),(2,1,10),(3,2,20),(4,2,30),(5,2,30),(6,NULL,NULL),(7,NULL,3); +insert into bar values (1,1,10),(2,2,20),(3,NULL,NULL),(4,3,NULL),(5,1,10); +analyze foo,bar; +explain (costs off) select foo.a, bar.b from foo left join bar on foo.a = bar.a +union select foo.a, bar.b from foo join bar on foo.a = bar.a; + QUERY PLAN +------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> HashAggregate Group Key: foo.a, bar.b - -> Redistribute Motion 3:3 (slice2; segments: 3) (cost=679.75..4735.18 rows=57400 width=8) + -> Redistribute Motion 3:3 (slice2; segments: 3) Hash Key: foo.a, bar.b - -> Append (cost=679.75..3587.18 rows=57400 width=8) - -> Hash Left Join (cost=679.75..1363.09 rows=28700 width=8) + -> Append + -> Hash Left Join Hash Cond: (foo.a = bar.a) - -> Seq Scan on foo (cost=0.00..321.00 rows=28700 width=4) - -> Hash (cost=321.00..321.00 rows=28700 width=8) - -> Seq Scan on bar (cost=0.00..321.00 rows=28700 width=8) - -> Hash Join (cost=679.75..1363.09 rows=28700 width=8) + -> Seq Scan on foo + -> Hash + -> Seq Scan on bar + -> Hash Join Hash Cond: (foo_1.a = bar_1.a) - -> Seq Scan on foo foo_1 (cost=0.00..321.00 rows=28700 width=4) - -> Hash (cost=321.00..321.00 rows=28700 width=8) - -> Seq Scan on bar bar_1 (cost=0.00..321.00 rows=28700 width=8) + -> Seq Scan on foo foo_1 + -> Hash + -> Seq Scan on bar bar_1 Optimizer: Postgres-based planner (17 rows) +------------------------------------- +-- CASES WHERE JOIN WILL BE PRUNED -- +------------------------------------- +-------------------------------------------------------------------------------- +-- join under UNION +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a union +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> HashAggregate + Group Key: foo.b, foo.c + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b, foo.c + -> Append + -> Seq Scan on foo + -> Seq Scan on bar + Optimizer: Postgres-based planner +(9 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a union +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + | + 2 | 30 + | 3 + 2 | 20 + 3 | + 1 | 10 +(6 rows) + +-------------------------------------------------------------------------------- +-- join under UNION ALL +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a union all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> Append + -> Seq Scan on foo + -> Seq Scan on bar + Optimizer: Postgres-based planner +(5 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a union all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 2 | 30 + | + 1 | 10 + 1 | 10 + 1 | 10 + 1 | 10 + 2 | 20 + 2 | 30 + | 3 + 2 | 20 + | + 3 | +(12 rows) + +-------------------------------------------------------------------------------- +-- join under INTERSECT +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> HashSetOp Intersect + -> Append + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: "*SELECT* 2".b, "*SELECT* 2".c + -> Subquery Scan on "*SELECT* 2" + -> Seq Scan on bar + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: "*SELECT* 1".b, "*SELECT* 1".c + -> Subquery Scan on "*SELECT* 1" + -> Seq Scan on foo + Optimizer: Postgres-based planner +(12 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + | + 1 | 10 + 2 | 20 +(3 rows) + +-------------------------------------------------------------------------------- +-- join under INTERSECT ALL +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> HashSetOp Intersect All + -> Append + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: "*SELECT* 2".b, "*SELECT* 2".c + -> Subquery Scan on "*SELECT* 2" + -> Seq Scan on bar + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: "*SELECT* 1".b, "*SELECT* 1".c + -> Subquery Scan on "*SELECT* 1" + -> Seq Scan on foo + Optimizer: Postgres-based planner +(12 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + | + 1 | 10 + 1 | 10 + 2 | 20 +(4 rows) + +-------------------------------------------------------------------------------- +-- join under EXCEPT +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a except +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> HashSetOp Except + -> Append + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: "*SELECT* 1".b, "*SELECT* 1".c + -> Subquery Scan on "*SELECT* 1" + -> Seq Scan on foo + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: "*SELECT* 2".b, "*SELECT* 2".c + -> Subquery Scan on "*SELECT* 2" + -> Seq Scan on bar + Optimizer: Postgres-based planner +(12 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a except +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + | 3 + 2 | 30 +(2 rows) + +-------------------------------------------------------------------------------- +-- join under EXCEPT ALL +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a except all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> HashSetOp Except All + -> Append + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: "*SELECT* 1".b, "*SELECT* 1".c + -> Subquery Scan on "*SELECT* 1" + -> Seq Scan on foo + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: "*SELECT* 2".b, "*SELECT* 2".c + -> Subquery Scan on "*SELECT* 2" + -> Seq Scan on bar + Optimizer: Postgres-based planner +(12 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a except all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + | 3 + 2 | 30 + 2 | 30 +(3 rows) + +------------------------------------------ +-- CASES WHERE JOIN WILL NOT BE PRUNED -- +------------------------------------------ +-------------------------------------------------------------------------------- +-- join under UNION +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a union +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> HashAggregate + Group Key: foo.b, foo.c + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b, foo.c + -> Append + -> Seq Scan on foo + -> Hash Right Join + Hash Cond: (foo_1.a = bar.a) + -> Seq Scan on foo foo_1 + -> Hash + -> Seq Scan on bar + Optimizer: Postgres-based planner +(13 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a union +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 2 | 10 + | + 2 | 30 + | 3 + 3 | 30 + 1 | 10 + 1 | 30 + 2 | 20 + | 20 +(9 rows) + +-------------------------------------------------------------------------------- +-- join under UNION ALL +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a union all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +-------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> Append + -> Seq Scan on foo + -> Hash Right Join + Hash Cond: (foo_1.a = bar.a) + -> Seq Scan on foo foo_1 + -> Hash + -> Seq Scan on bar + Optimizer: Postgres-based planner +(9 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a union all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 1 | 10 + 2 | 20 + 2 | 30 + | 3 + 2 | 10 + | 20 + 3 | 30 + 2 | 30 + | + 1 | 30 + 1 | 10 + 1 | 10 +(12 rows) + +-------------------------------------------------------------------------------- +-- join under INTERSECT +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> HashSetOp Intersect + -> Append + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: "*SELECT* 2".b, "*SELECT* 2".c + -> Subquery Scan on "*SELECT* 2" + -> Hash Right Join + Hash Cond: (foo.a = bar.a) + -> Seq Scan on foo + -> Hash + -> Seq Scan on bar + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: "*SELECT* 1".b, "*SELECT* 1".c + -> Subquery Scan on "*SELECT* 1" + -> Seq Scan on foo foo_1 + Optimizer: Postgres-based planner +(16 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 1 | 10 +(1 row) + +-------------------------------------------------------------------------------- +-- join under INTERSECT ALL +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> HashSetOp Intersect All + -> Append + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: "*SELECT* 2".b, "*SELECT* 2".c + -> Subquery Scan on "*SELECT* 2" + -> Hash Right Join + Hash Cond: (foo.a = bar.a) + -> Seq Scan on foo + -> Hash + -> Seq Scan on bar + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: "*SELECT* 1".b, "*SELECT* 1".c + -> Subquery Scan on "*SELECT* 1" + -> Seq Scan on foo foo_1 + Optimizer: Postgres-based planner +(16 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 1 | 10 +(1 row) + +-------------------------------------------------------------------------------- +-- join under EXCEPT +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a except +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> HashSetOp Except + -> Append + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: "*SELECT* 1".b, "*SELECT* 1".c + -> Subquery Scan on "*SELECT* 1" + -> Seq Scan on foo + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: "*SELECT* 2".b, "*SELECT* 2".c + -> Subquery Scan on "*SELECT* 2" + -> Hash Right Join + Hash Cond: (foo_1.a = bar.a) + -> Seq Scan on foo foo_1 + -> Hash + -> Seq Scan on bar + Optimizer: Postgres-based planner +(16 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a except +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + | 3 + | + 2 | 30 + 2 | 20 +(4 rows) + +-------------------------------------------------------------------------------- +-- join under EXCEPT ALL +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a except all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> HashSetOp Except All + -> Append + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: "*SELECT* 1".b, "*SELECT* 1".c + -> Subquery Scan on "*SELECT* 1" + -> Seq Scan on foo + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: "*SELECT* 2".b, "*SELECT* 2".c + -> Subquery Scan on "*SELECT* 2" + -> Hash Right Join + Hash Cond: (foo_1.a = bar.a) + -> Seq Scan on foo foo_1 + -> Hash + -> Seq Scan on bar + Optimizer: Postgres-based planner +(16 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a except all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + | 3 + | + 2 | 30 + 2 | 30 + 2 | 20 + 1 | 10 +(6 rows) + drop table foo; drop table bar; ----------------------------------------------------------------- diff --git a/src/test/regress/expected/join_gp_optimizer.out b/src/test/regress/expected/join_gp_optimizer.out index efc8f8af6c..2cd20791b4 100644 --- a/src/test/regress/expected/join_gp_optimizer.out +++ b/src/test/regress/expected/join_gp_optimizer.out @@ -2072,32 +2072,527 @@ drop table barJoinPruning; -- -- Cases where join under union -- -create table foo(a int primary key, b int); -create table bar(a int primary key, b int); -explain select foo.a, bar.b from foo left join bar on foo.a = bar.a - union - select foo.a, bar.b from foo join bar on foo.a = bar.a; - QUERY PLAN --------------------------------------------------------------------------------------------------------- - Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..874.00 rows=1 width=8) - -> GroupAggregate (cost=0.00..874.00 rows=1 width=8) +create table foo(a int primary key, b int,c int); +create table bar(a int primary key, b int,c int); +insert into foo values (1,1,10),(2,1,10),(3,2,20),(4,2,30),(5,2,30),(6,NULL,NULL),(7,NULL,3); +insert into bar values (1,1,10),(2,2,20),(3,NULL,NULL),(4,3,NULL),(5,1,10); +analyze foo,bar; +explain (costs off) select foo.a, bar.b from foo left join bar on foo.a = bar.a +union select foo.a, bar.b from foo join bar on foo.a = bar.a; + QUERY PLAN +---------------------------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> GroupAggregate Group Key: foo.a, bar.b - -> Sort (cost=0.00..874.00 rows=1 width=8) + -> Sort Sort Key: foo.a, bar.b - -> Append (cost=0.00..874.00 rows=1 width=8) - -> Nested Loop Left Join (cost=0.00..437.00 rows=1 width=8) - Join Filter: true - -> Seq Scan on foo (cost=0.00..431.00 rows=1 width=4) - -> Index Scan using bar_pkey on bar (cost=0.00..6.00 rows=1 width=4) - Index Cond: (a = foo.a) - -> Nested Loop (cost=0.00..437.00 rows=1 width=8) - Join Filter: true - -> Seq Scan on foo foo_1 (cost=0.00..431.00 rows=1 width=4) - -> Index Scan using bar_pkey on bar bar_1 (cost=0.00..6.00 rows=1 width=4) - Index Cond: (a = foo_1.a) + -> Append + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.a, bar.b + -> Nested Loop Left Join + Join Filter: true + -> Seq Scan on foo + -> Index Scan using bar_pkey on bar + Index Cond: (a = foo.a) + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: foo_1.a, bar_1.b + -> Nested Loop + Join Filter: true + -> Seq Scan on bar bar_1 + -> Index Scan using foo_pkey on foo foo_1 + Index Cond: (a = bar_1.a) + Optimizer: GPORCA +(21 rows) + +------------------------------------- +-- CASES WHERE JOIN WILL BE PRUNED -- +------------------------------------- +-------------------------------------------------------------------------------- +-- join under UNION +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a union +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> GroupAggregate + Group Key: foo.b, foo.c + -> Sort + Sort Key: foo.b, foo.c + -> Append + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b, foo.c + -> Seq Scan on foo + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: bar.b, bar.c + -> Seq Scan on bar + Optimizer: GPORCA +(13 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a union +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 2 | 30 + | 3 + | + 1 | 10 + 2 | 20 + 3 | +(6 rows) + +-------------------------------------------------------------------------------- +-- join under UNION ALL +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a union all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> Append + -> Seq Scan on foo + -> Seq Scan on bar + Optimizer: GPORCA +(5 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a union all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 1 | 10 + 1 | 10 + 1 | 10 + 2 | 20 + 2 | 30 + | 3 + 2 | 20 + | + 3 | + 2 | 30 + | + 1 | 10 +(12 rows) + +-------------------------------------------------------------------------------- +-- join under INTERSECT +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +---------------------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> Hash Join + Hash Cond: ((NOT (foo.b IS DISTINCT FROM bar.b)) AND (NOT (foo.c IS DISTINCT FROM bar.c))) + -> GroupAggregate + Group Key: foo.b, foo.c + -> Sort + Sort Key: foo.b, foo.c + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b, foo.c + -> Seq Scan on foo + -> Hash + -> GroupAggregate + Group Key: bar.b, bar.c + -> Sort + Sort Key: bar.b, bar.c + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: bar.b, bar.c + -> Seq Scan on bar + Optimizer: GPORCA +(19 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + | + 1 | 10 + 2 | 20 +(3 rows) + +-------------------------------------------------------------------------------- +-- join under INTERSECT ALL +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> Hash Semi Join + Hash Cond: ((NOT (foo.b IS DISTINCT FROM bar.b)) AND (NOT (foo.c IS DISTINCT FROM bar.c)) AND (NOT ((row_number() OVER (?)) IS DISTINCT FROM (row_number() OVER (?))))) + -> WindowAgg + Partition By: foo.b, foo.c + -> Sort + Sort Key: foo.b, foo.c + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b, foo.c + -> Seq Scan on foo + -> Hash + -> WindowAgg + Partition By: bar.b, bar.c + -> Sort + Sort Key: bar.b, bar.c + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: bar.b, bar.c + -> Seq Scan on bar + Optimizer: GPORCA +(19 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + | + 1 | 10 + 1 | 10 + 2 | 20 +(4 rows) + +-------------------------------------------------------------------------------- +-- join under EXCEPT +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a except +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +---------------------------------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> GroupAggregate + Group Key: foo.b, foo.c + -> Sort + Sort Key: foo.b, foo.c + -> Hash Anti Join + Hash Cond: ((NOT (foo.b IS DISTINCT FROM bar.b)) AND (NOT (foo.c IS DISTINCT FROM bar.c))) + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b, foo.c + -> Seq Scan on foo + -> Hash + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: bar.b, bar.c + -> Seq Scan on bar + Optimizer: GPORCA +(15 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a except +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 2 | 30 + | 3 +(2 rows) + +-------------------------------------------------------------------------------- +-- join under EXCEPT ALL +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a except all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> Hash Anti Join + Hash Cond: ((NOT (foo.b IS DISTINCT FROM bar.b)) AND (NOT (foo.c IS DISTINCT FROM bar.c)) AND (NOT ((row_number() OVER (?)) IS DISTINCT FROM (row_number() OVER (?))))) + -> WindowAgg + Partition By: foo.b, foo.c + -> Sort + Sort Key: foo.b, foo.c + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b, foo.c + -> Seq Scan on foo + -> Hash + -> WindowAgg + Partition By: bar.b, bar.c + -> Sort + Sort Key: bar.b, bar.c + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: bar.b, bar.c + -> Seq Scan on bar + Optimizer: GPORCA +(19 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a except all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 2 | 30 + 2 | 30 + | 3 +(3 rows) + +------------------------------------------ +-- CASES WHERE JOIN WILL NOT BE PRUNED -- +------------------------------------------ +-------------------------------------------------------------------------------- +-- join under UNION +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a union +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +---------------------------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> GroupAggregate + Group Key: foo.b, foo.c + -> Sort + Sort Key: foo.b, foo.c + -> Append + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b, foo.c + -> Seq Scan on foo + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: bar.b, foo_1.c + -> Nested Loop Left Join + Join Filter: true + -> Seq Scan on bar + -> Index Scan using foo_pkey on foo foo_1 + Index Cond: (a = bar.a) Optimizer: GPORCA (17 rows) +select foo.b,foo.c from foo left join bar on foo.a=bar.a union +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 1 | 10 + 1 | 30 + 2 | 20 + | 20 + 2 | 10 + 2 | 30 + 3 | 30 + | 3 + | +(9 rows) + +-------------------------------------------------------------------------------- +-- join under UNION ALL +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a union all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +---------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> Append + -> Seq Scan on foo + -> Nested Loop Left Join + Join Filter: true + -> Seq Scan on bar + -> Index Scan using foo_pkey on foo foo_1 + Index Cond: (a = bar.a) + Optimizer: GPORCA +(9 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a union all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 2 | 30 + | + 1 | 30 + 1 | 10 + 2 | 20 + 2 | 30 + | 3 + 2 | 10 + | 20 + 3 | 30 + 1 | 10 + 1 | 10 +(12 rows) + +-------------------------------------------------------------------------------- +-- join under INTERSECT +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> Hash Join + Hash Cond: ((NOT (foo.b IS DISTINCT FROM bar.b)) AND (NOT (foo.c IS DISTINCT FROM foo_1.c))) + -> GroupAggregate + Group Key: foo.b, foo.c + -> Sort + Sort Key: foo.b, foo.c + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b, foo.c + -> Seq Scan on foo + -> Hash + -> GroupAggregate + Group Key: bar.b, foo_1.c + -> Sort + Sort Key: bar.b, foo_1.c + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: bar.b, foo_1.c + -> Nested Loop Left Join + Join Filter: true + -> Seq Scan on bar + -> Index Scan using foo_pkey on foo foo_1 + Index Cond: (a = bar.a) + Optimizer: GPORCA +(23 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 1 | 10 +(1 row) + +-------------------------------------------------------------------------------- +-- join under INTERSECT ALL +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> Hash Semi Join + Hash Cond: ((NOT (foo.b IS DISTINCT FROM bar.b)) AND (NOT (foo.c IS DISTINCT FROM foo_1.c)) AND (NOT ((row_number() OVER (?)) IS DISTINCT FROM (row_number() OVER (?))))) + -> WindowAgg + Partition By: foo.b, foo.c + -> Sort + Sort Key: foo.b, foo.c + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b, foo.c + -> Seq Scan on foo + -> Hash + -> WindowAgg + Partition By: bar.b, foo_1.c + -> Sort + Sort Key: bar.b, foo_1.c + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: bar.b, foo_1.c + -> Nested Loop Left Join + Join Filter: true + -> Seq Scan on bar + -> Index Scan using foo_pkey on foo foo_1 + Index Cond: (a = bar.a) + Optimizer: GPORCA +(23 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 1 | 10 +(1 row) + +-------------------------------------------------------------------------------- +-- join under EXCEPT +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a except +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------ + Gather Motion 3:1 (slice1; segments: 3) + -> GroupAggregate + Group Key: foo.b, foo.c + -> Sort + Sort Key: foo.b, foo.c + -> Hash Anti Join + Hash Cond: ((NOT (foo.b IS DISTINCT FROM bar.b)) AND (NOT (foo.c IS DISTINCT FROM foo_1.c))) + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b, foo.c + -> Seq Scan on foo + -> Hash + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: bar.b, foo_1.c + -> Nested Loop Left Join + Join Filter: true + -> Seq Scan on bar + -> Index Scan using foo_pkey on foo foo_1 + Index Cond: (a = bar.a) + Optimizer: GPORCA +(19 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a except +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 2 | 20 + 2 | 30 + | 3 + | +(4 rows) + +-------------------------------------------------------------------------------- +-- join under EXCEPT ALL +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a except all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + QUERY PLAN +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + Gather Motion 3:1 (slice1; segments: 3) + -> Hash Anti Join + Hash Cond: ((NOT (foo.b IS DISTINCT FROM bar.b)) AND (NOT (foo.c IS DISTINCT FROM foo_1.c)) AND (NOT ((row_number() OVER (?)) IS DISTINCT FROM (row_number() OVER (?))))) + -> WindowAgg + Partition By: foo.b, foo.c + -> Sort + Sort Key: foo.b, foo.c + -> Redistribute Motion 3:3 (slice2; segments: 3) + Hash Key: foo.b, foo.c + -> Seq Scan on foo + -> Hash + -> WindowAgg + Partition By: bar.b, foo_1.c + -> Sort + Sort Key: bar.b, foo_1.c + -> Redistribute Motion 3:3 (slice3; segments: 3) + Hash Key: bar.b, foo_1.c + -> Nested Loop Left Join + Join Filter: true + -> Seq Scan on bar + -> Index Scan using foo_pkey on foo foo_1 + Index Cond: (a = bar.a) + Optimizer: GPORCA +(23 rows) + +select foo.b,foo.c from foo left join bar on foo.a=bar.a except all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + b | c +---+---- + 2 | 30 + 2 | 30 + | 3 + | + 1 | 10 + 2 | 20 +(6 rows) + drop table foo; drop table bar; ----------------------------------------------------------------- diff --git a/src/test/regress/sql/join_gp.sql b/src/test/regress/sql/join_gp.sql index b2824cc9d8..3061391c99 100644 --- a/src/test/regress/sql/join_gp.sql +++ b/src/test/regress/sql/join_gp.sql @@ -860,11 +860,167 @@ drop table barJoinPruning; -- -- Cases where join under union -- -create table foo(a int primary key, b int); -create table bar(a int primary key, b int); -explain select foo.a, bar.b from foo left join bar on foo.a = bar.a - union - select foo.a, bar.b from foo join bar on foo.a = bar.a; +create table foo(a int primary key, b int,c int); +create table bar(a int primary key, b int,c int); +insert into foo values (1,1,10),(2,1,10),(3,2,20),(4,2,30),(5,2,30),(6,NULL,NULL),(7,NULL,3); +insert into bar values (1,1,10),(2,2,20),(3,NULL,NULL),(4,3,NULL),(5,1,10); +analyze foo,bar; + +explain (costs off) select foo.a, bar.b from foo left join bar on foo.a = bar.a +union select foo.a, bar.b from foo join bar on foo.a = bar.a; + +------------------------------------- +-- CASES WHERE JOIN WILL BE PRUNED -- +------------------------------------- + +-------------------------------------------------------------------------------- +-- join under UNION +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a union +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + +select foo.b,foo.c from foo left join bar on foo.a=bar.a union +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + +-------------------------------------------------------------------------------- +-- join under UNION ALL +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a union all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + +select foo.b,foo.c from foo left join bar on foo.a=bar.a union all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + +-------------------------------------------------------------------------------- +-- join under INTERSECT +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + +select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + +-------------------------------------------------------------------------------- +-- join under INTERSECT ALL +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + +select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + +-------------------------------------------------------------------------------- +-- join under EXCEPT +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a except +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + +select foo.b,foo.c from foo left join bar on foo.a=bar.a except +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + +-------------------------------------------------------------------------------- +-- join under EXCEPT ALL +-- For the below query the output columns of both the CLogicalLeftOuterJoin +-- are from the outer relation, so we can prune both the joins +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a except all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + +select foo.b,foo.c from foo left join bar on foo.a=bar.a except all +select bar.b,bar.c from bar left join foo on foo.a=bar.a; + +------------------------------------------ +-- CASES WHERE JOIN WILL NOT BE PRUNED -- +------------------------------------------ + +-------------------------------------------------------------------------------- +-- join under UNION +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a union +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + +select foo.b,foo.c from foo left join bar on foo.a=bar.a union +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + +-------------------------------------------------------------------------------- +-- join under UNION ALL +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a union all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + +select foo.b,foo.c from foo left join bar on foo.a=bar.a union all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + +-------------------------------------------------------------------------------- +-- join under INTERSECT +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + +select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + +-------------------------------------------------------------------------------- +-- join under INTERSECT ALL +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + +select foo.b,foo.c from foo left join bar on foo.a=bar.a intersect all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + +-------------------------------------------------------------------------------- +-- join under EXCEPT +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a except +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + +select foo.b,foo.c from foo left join bar on foo.a=bar.a except +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + +-------------------------------------------------------------------------------- +-- join under EXCEPT ALL +-- For the below query since for the outer CLogicalLeftOuterJoin, all the output +-- columns are from the outer relation, the outer join can be pruned but for the +-- inner CLogicalLeftOuterJoin the output column contains columns from +-- inner relation.So the inner join can't be pruned. +-------------------------------------------------------------------------------- +explain (costs off) select foo.b,foo.c from foo left join bar on foo.a=bar.a except all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + +select foo.b,foo.c from foo left join bar on foo.a=bar.a except all +select bar.b,foo.c from bar left join foo on foo.a=bar.a; + drop table foo; drop table bar; ----------------------------------------------------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
