amansinha100 commented on a change in pull request #1775: DRILL-7227: Fix
predicate check in DrillRelOptUtil.analyzeSimpleEquiJoin
URL: https://github.com/apache/drill/pull/1775#discussion_r279995636
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/cost/DrillRelMdDistinctRowCount.java
##########
@@ -155,18 +155,18 @@ private Double getDistinctRowCountInternal(TableScan
scan, RelMetadataQuery mq,
}
double s = 1.0;
- boolean allCols = true;
+ boolean allColsHaveNDV = true;
for (int i = 0; i < groupKey.length(); i++) {
final String colName = type.getFieldNames().get(i);
- // Skip NDV, if not available
if (!groupKey.get(i)) {
- allCols = false;
- break;
+ continue;
}
ColumnStatistics columnStatistics = tableMetadata != null ?
tableMetadata.getColumnStatistics(SchemaPath.getSimplePath(colName))
: null;
Double ndv = columnStatistics != null ? (Double)
columnStatistics.getStatistic(ColumnStatisticsKind.NDV) : null;
+ // Skip NDV, if not available
if (ndv == null) {
+ allColsHaveNDV = false;
Review comment:
If any one column out of potentially many group-by columns does not have NDV
stats, the logic here is that it will continue checking all the columns and
compute the selectivity but then eventually on line 185 we return a default
rowcount of 10%. Why not break here if we cannot make use of the other
columns's NDV ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services