Github user jinfengni commented on a diff in the pull request:

    https://github.com/apache/drill/pull/671#discussion_r91389995
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
 ---
    @@ -43,6 +48,30 @@ public Double getDistinctRowCount(RelNode rel, 
ImmutableBitSet groupKey, RexNode
         }
       }
     
    +  @Override
    +  public Double getDistinctRowCount(Join rel, ImmutableBitSet groupKey, 
RexNode predicate) {
    +    Double count = null;
    +    if (rel != null) {
    +      if (rel instanceof JoinPrel) {
    +        // for Drill physical joins, don't recompute the distinct row 
count since it was already done
    +        // during logical planning; retrieve the cached value.
    +        count = ((JoinPrel)rel).getDistinctRowCount();
    +        if (count.doubleValue() < 0) {
    +          logger.warn("Invalid cached distinct row count for {}; 
recomputing..", rel.getDescription());
    +          count = super.getDistinctRowCount(rel, groupKey, predicate);
    +        }
    +      } else {
    +        count = super.getDistinctRowCount(rel, groupKey, predicate);
    --- End diff --
    
    The API of RelMdDistinctRowCount seems to indicate the distinct rowcount 
depends on input of groupKey and predicate. However, the cached value in 
DrillJoinRel does not differentiate based on groupKey / predicate. Will it 
cause issue in the cases this getDistinctRowCount() is called multiple times 
with different groupKey / predicate? 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to