jbimbert commented on a change in pull request #1298: DRILL-5796: Filter 
pruning for multi rowgroup parquet file
URL: https://github.com/apache/drill/pull/1298#discussion_r203966145
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/expr/stat/ParquetIsPredicate.java
 ##########
 @@ -124,8 +124,7 @@ private static LogicalExpression 
createIsTruePredicate(LogicalExpression expr) {
    */
   private static LogicalExpression createIsFalsePredicate(LogicalExpression 
expr) {
     return new ParquetIsPredicate<Boolean>(expr, (exprStat, evaluator) ->
-        //if min value is not false or if there are all nulls  -> canDrop
-        isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin()
+      isAllNulls(exprStat, evaluator.getRowCount()) || 
exprStat.hasNonNullValue() && ((BooleanStatistics) exprStat).getMin() ? 
RowsMatch.NONE : checkNull(exprStat)
 
 Review comment:
   blnTbl/0_0_1.parquet => ST:[min: false, max: false, num_nulls: 0] : 8 tests 
in  testBooleanPredicate()
    tfTbl/ft0.parquet => ST:[min: false, max: true, num_nulls: 0] : 4 tests in 
testBooleanPredicate
   
   example1:
   select * from 
`ava-exec/src/test/resources/parquetFilterPush/blnTbl/0_0_1.parquet` where 
col_bln is false returns (false, false, false)
   
   example2:
   select * from 
`java-exec/src/test/resources/parquetFilterPush/tfTbl/ft0.parquet` where a is 
true[resp. false]  return true[resp. false]
   
   Finally, when running the query  
   select * from dfs.tmp.`blnTbl` where col_bln is false
   with blnTbl contains only 0_0_0.parquet (T,T,T) and 0_0_1.parquet (F,F,F)
   the physical plan reads:
   00-00    Screen : rowType = RecordType(DYNAMIC_STAR **): rowcount = 3.0, 
cumulative cost = {9.3 rows, 12.3 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
523
   00-01      Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): 
rowcount = 3.0, cumulative cost = {9.0 rows, 12.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 522
   00-02        Project(**=[$0]) : rowType = RecordType(DYNAMIC_STAR **): 
rowcount = 3.0, cumulative cost = {6.0 rows, 9.0 cpu, 0.0 io, 0.0 network, 0.0 
memory}, id = 521
   00-03          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/tmp/blnTbl/0_0_1.parquet]], selectionRoot=file:/tmp/blnTbl, numFiles=1, 
numRowGroups=1, usedMetadataFile=false, columns=[`**`]]]) : rowType = 
RecordType(DYNAMIC_STAR **, ANY col_bln): rowcount = 3.0, cumulative cost = 
{3.0 rows, 6.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 520
   No more filter since it returns NONE for 0_0_0.parquet and ALL for 
0_0_1.parquet.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to