[
https://issues.apache.org/jira/browse/DRILL-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155500#comment-16155500
]
ASF GitHub Bot commented on DRILL-5691:
---------------------------------------
Github user amansinha100 commented on a diff in the pull request:
https://github.com/apache/drill/pull/889#discussion_r137291711
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java
---
@@ -203,35 +203,27 @@ public static void
addLeastRestrictiveCasts(LogicalExpression[] leftExpressions,
}
/**
- * Utility method to check if a subquery (represented by its root
RelNode) is provably scalar. Currently
- * only aggregates with no group-by are considered scalar. In the
future, this method should be generalized
- * to include more cases and reconciled with Calcite's notion of scalar.
+ * Utility method to check if a subquery (represented by its root
RelNode) is provably scalar.
* @param root The root RelNode to be examined
* @return True if the root rel or its descendant is scalar, False
otherwise
*/
public static boolean isScalarSubquery(RelNode root) {
- DrillAggregateRel agg = null;
- RelNode currentrel = root;
- while (agg == null && currentrel != null) {
- if (currentrel instanceof DrillAggregateRel) {
- agg = (DrillAggregateRel)currentrel;
- } else if (currentrel instanceof RelSubset) {
- currentrel = ((RelSubset)currentrel).getBest() ;
- } else if (currentrel.getInputs().size() == 1) {
- // If the rel is not an aggregate or RelSubset, but is a
single-input rel (could be Project,
- // Filter, Sort etc.), check its input
- currentrel = currentrel.getInput(0);
- } else {
- break;
- }
- }
-
- if (agg != null) {
- if (agg.getGroupSet().isEmpty()) {
- return true;
+ RelMetadataQuery relMetadataQuery = RelMetadataQuery.instance();
+ RelNode currentRel = root;
+ for (; ; ) {
+ if (currentRel instanceof RelSubset) {
+ currentRel = ((RelSubset) currentRel).getBest();
+ } else if (currentRel != null) {
+ Double rowCount = relMetadataQuery.getRowCount(currentRel);
--- End diff --
getRowCount() is not correct. Pls see my prior comment on using the
RelMdMaxRowCount.getMaxRowCount() APIs. The reason is getRowCount() will
give an estimate which may not match the actual run-time value, whereas the
getMaxRowCount() is an assertion by the optimizer that row count cannot exceed
a number N (in your case 1).
> multiple count distinct query planning error at physical phase
> ---------------------------------------------------------------
>
> Key: DRILL-5691
> URL: https://issues.apache.org/jira/browse/DRILL-5691
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Affects Versions: 1.9.0, 1.10.0
> Reporter: weijie.tong
>
> I materialized the count distinct query result in a cache , added a plugin
> rule to translate the (Aggregate、Aggregate、Project、Scan) or
> (Aggregate、Aggregate、Scan) to (Project、Scan) at the PARTITION_PRUNING phase.
> Then ,once user issue count distinct queries , it will be translated to query
> the cache to get the result.
> eg1: " select count(*),sum(a) ,count(distinct b) from t where dt=xx "
> eg2:"select count(*),sum(a) ,count(distinct b) ,count(distinct c) from t
> where dt=xxx "
> eg3:"select count(distinct b), count(distinct c) from t where dt=xxx"
> eg1 will be right and have a query result as I expected , but eg2 will be
> wrong at the physical phase.The error info is here:
> https://gist.github.com/weijietong/1b8ed12db9490bf006e8b3fe0ee52269.
> eg3 will also get the similar error.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)