arina-ielchiieva commented on a change in pull request #1298: DRILL-5796:
Filter pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r202643532
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
##########
@@ -124,8 +124,7 @@ private static LogicalExpression
createIsTruePredicate(LogicalExpression expr) {
*/
private static LogicalExpression createIsFalsePredicate(LogicalExpression
expr) {
return new ParquetIsPredicate<Boolean>(expr, (exprStat, evaluator) ->
- //if min value is not false or if there are all nulls -> canDrop
- isAllNulls(exprStat, evaluator.getRowCount()) ||
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin()
+ exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin() ||
isAllNulls(exprStat, evaluator.getRowCount()) ? RowsMatch.NONE :
checkNull(exprStat)
Review comment:
@vrozov this is valid case, the same issue if described in DRILL-6603, we
missed `hasNonNullValue` check for `is null` predicate and filtered out row
group that we shouldn't.
@jbimbert `isAllNulls` is used more often in your code then in previous
version. In previous version it was used only three times and each time had
`hasNonNullValue` check. I am afraid in your code checks without
`hasNonNullValue` may produce incorrect results. I am thinking what if we
modify `isAllNulls` method to include `hasNonNullValue` check inside?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services