[
https://issues.apache.org/jira/browse/DRILL-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482450#comment-16482450
]
ASF GitHub Bot commented on DRILL-6374:
---------------------------------------
vdiravka commented on a change in pull request #1262: DRILL-6374: Transitive
Closure leads to TPCH Queries regressions and OOM when run concurrency test
URL: https://github.com/apache/drill/pull/1262#discussion_r189566458
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillFilterJoinRules.java
##########
@@ -59,20 +58,44 @@ public boolean apply(Join join, JoinRelType joinType,
RexNode exp) {
}
};
+ /** Predicate that always returns true for any filter in OUTER join, and
only true
+ * for strict EQUAL or IS_DISTINCT_FROM conditions (without functions) over
RexInputRef in INNER join.
+ * With this predicate, the filter expression that return true will be kept
in the JOIN OP.
+ * Example: INNER JOIN, L.C1 = R.C2 will be kepted in JOIN.
+ * L.C3 + 100 = R.C4 + 100 and L.C5 < R.C6 will be
pulled up into Filter above JOIN.
+ * OUTER JOIN, Keep any filter in JOIN.
+ */
+ public static final FilterJoinRule.Predicate STRICT_EQUAL_IS_DISTINCT_FROM =
Review comment:
No, this expression will be pushed into above Filter.
If the predicate mentioned by you will left in join condition, then Drill
fails on further planning stage.
Early this expression was factored out into above filter with
`DrillJoinRule.INSTANCE` from `staticRuleSet` in the main LOGICAL stage:
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerPhase.java#L313
But in `TRANSITIVE_CLOSURE` after main `LOGICAL` stage the
`DrillFilterJoinRules.DRILL_FILTER_INTO_JOIN` rule with usual
`EQUAL_IS_DISTINCT_FROM` will put such expression again into join condition. To
prevent that new predicate was added.
I have edited a little bit the description for better understanding. Please
let me know if it is necessary to improve the description for the predicate.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Transitive Closure leads to TPCH Queries regressions and OOM when run
> concurrency test
> --------------------------------------------------------------------------------------
>
> Key: DRILL-6374
> URL: https://issues.apache.org/jira/browse/DRILL-6374
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 1.14.0
> Environment: RHEL 7
> Reporter: Dechang Gu
> Assignee: Vitalii Diravka
> Priority: Critical
> Fix For: 1.14.0
>
> Attachments: TPCH_09_2_id_2517381b-1a61-3db5-40c3-4463bd421365.json,
> TPCH_09_2_id_2517497b-d4da-dab6-6124-abde5804a25f.json
>
>
> Run TPCH regression test on Apache Drill 1.14.0 master commit
> 6fcaf4268eddcb09010b5d9c5dfb3b3be5c3f903 (DRILL-6173), most of the queries
> regressed.
> In particular, TPC-H Query 9 takes about 4x time (36 sec vs 8.6 sec),
> comparing to that when run against the parent commit
> (9173308710c3decf8ff745493ad3e85ccdaf7c37).
> Further in the concurrency test for the commit, with 48 clients each running
> 16 TPCH queries (so total 768 queries are executed) with
> planner.width.max_per_node=5, some queries hit OOM and caused 273 queries
> failed, while for the parent commit all the 768 queries completed
> successfully.
>
> Profiles for TPCH_09 in the regression tests are uploaded:
> * The failing commit file name:
> [^TPCH_09_2_id_2517381b-1a61-3db5-40c3-4463bd421365.json],
> * The parent commit file name:
> [^TPCH_09_2_id_2517497b-d4da-dab6-6124-abde5804a25f.json] ).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)