Ben-Zvi commented on a change in pull request #1783: DRILL-7240: Catch runtime
pruning filter-match exceptions and do not prune these rowgroups
URL: https://github.com/apache/drill/pull/1783#discussion_r281435745
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/AbstractParquetScanBatchCreator.java
##########
@@ -184,18 +185,30 @@ protected ScanBatch getBatch(ExecutorFragmentContext
context, AbstractParquetRow
//
// Perform the Run-Time Pruning - i.e. Skip this rowgroup if the
match fails
//
- RowsMatch match = FilterEvaluatorUtils.matches(filterPredicate,
columnsStatistics, footerRowCount);
-
- // collect logging info
- long timeToRead = pruneTimer.elapsed(TimeUnit.MICROSECONDS);
- pruneTimer.stop();
- pruneTimer.reset();
- totalPruneTime += timeToRead;
- logger.trace("Run-time pruning: {} row-group {} (RG index: {} row
count: {}), took {} usec", // trace each single rowgroup
- match == RowsMatch.NONE ? "Excluded" : "Included",
rowGroup.getPath(), rowGroupIndex, footerRowCount, timeToRead);
+ RowsMatch matchResult = RowsMatch.ALL;
+ try {
+ matchResult = FilterEvaluatorUtils.matches(filterPredicate,
columnsStatistics, footerRowCount);
+
+ // collect logging info
+ long timeToRead = pruneTimer.elapsed(TimeUnit.MICROSECONDS);
+ pruneTimer.stop();
+ pruneTimer.reset();
+ totalPruneTime += timeToRead;
+ logger.trace("Run-time pruning: {} row-group {} (RG index: {} row
count: {}), took {} usec", // trace each single rowgroup
+ matchResult == RowsMatch.NONE ? "Excluded" : "Included",
rowGroup.getPath(), rowGroupIndex, footerRowCount, timeToRead);
+ } catch (ClassCastException cce) {
+ if ( ! matchCastErrorNotified ) {
+ logger.info("Run-time pruning check failed due to type casting.
Skipping pruning rowgroups starting from {}. (Error: {})", rowGroup.getPath(),
cce.getMessage());
Review comment:
Done, here is a sample from the log (tested with 6 rowgroups/files; three of
which fail, or the remaining three one was pruned):
```
2019-05-06 18:37:19,809 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0]
TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read
Footer,,/tmp/twofoo/sub/0_0_3.parquet,,0,0,0,170059
2019-05-06 18:37:19,828 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0]
TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read
Footer,,/tmp/twofoo/sub/0_0_2.parquet,,0,0,0,4691
2019-05-06 18:37:19,829 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0]
TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning: Included
row-group /tmp/twofoo/sub/0_0_2.parquet (RG index: 0 row count: 2), took 466
usec
2019-05-06 18:37:19,869 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0]
TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read
Footer,,/tmp/twofoo/sub/0_0_0.parquet,,0,0,0,37316
2019-05-06 18:37:19,870 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0]
TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning: Excluded
row-group /tmp/twofoo/sub/0_0_0.parquet (RG index: 0 row count: 2), took 683
usec
2019-05-06 18:37:19,909 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0]
TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read
Footer,,/tmp/twofoo/sub/0_0_1.parquet,,0,0,0,38874
2019-05-06 18:37:19,914 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0]
TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read
Footer,,/tmp/twofoo/sub/0_0_4.parquet,,0,0,0,2417
2019-05-06 18:37:19,915 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0]
TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning: Included
row-group /tmp/twofoo/sub/0_0_4.parquet (RG index: 0 row count: 2), took 356
usec
2019-05-06 18:37:19,919 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0]
TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read
Footer,,/tmp/twofoo/sub/0_0_5.parquet,,0,0,0,2749
2019-05-06 18:37:19,922 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] INFO
o.a.d.e.s.p.AbstractParquetScanBatchCreator - Finished parquet_runtime_pruning
in 1505 usec. Out of given 6 rowgroups, 1 were pruned.
2019-05-06 18:37:19,922 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] INFO
o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning check skipped
for 3 out of 6 rowgroups due to: java.lang.Integer cannot be cast to
java.lang.Long
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services