[
https://issues.apache.org/jira/browse/DRILL-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394131#comment-15394131
]
ASF GitHub Bot commented on DRILL-4786:
---------------------------------------
Github user jinfengni commented on a diff in the pull request:
https://github.com/apache/drill/pull/553#discussion_r72296459
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java
---
@@ -387,16 +378,35 @@ protected void doOnMatch(RelOptRuleCall call, Filter
filterRel, Project projectR
condition = condition.accept(reverseVisitor);
pruneCondition = pruneCondition.accept(reverseVisitor);
- if (checkForSingle && isSinglePartition && !wasAllPartitionsPruned) {
+ if (descriptor.supportsMetadataCachePruning() &&
!wasAllPartitionsPruned) {
// if metadata cache file could potentially be used, then assign a
proper cacheFileRoot
- String path = "";
- for (int j = 0; j <= maxIndex; j++) {
- path += "/" + spInfo[j];
+ int index = -1;
+ if (!matchBitSet.isEmpty()) {
+ String path = "";
+ index = matchBitSet.length() - 1;
+
+ for (int j = 0; j < matchBitSet.length(); j++) {
+ if (!matchBitSet.get(j)) {
+ // stop at the first index with no match and use the
immediate
+ // previous index
+ index = j-1;
+ break;
+ }
+ }
+ for (int j=0; j <= index; j++) {
+ path += "/" + spInfo[j];
+ }
+ cacheFileRoot = descriptor.getBaseTableLocation() + path;
--- End diff --
cacheFileRoot is set within 'IF' branch. Are we going to get a null for
cacheFileRoot, if matchBitSet has no bit set? Will cacehFileRoot=null cause
issue in downstream logic?
> Improve metadata cache performance for queries with multiple partitions
> -----------------------------------------------------------------------
>
> Key: DRILL-4786
> URL: https://issues.apache.org/jira/browse/DRILL-4786
> Project: Apache Drill
> Issue Type: Improvement
> Components: Metadata, Query Planning & Optimization
> Affects Versions: 1.7.0
> Reporter: Aman Sinha
> Assignee: Aman Sinha
>
> Consider queries of the following type run against Parquet data with
> metadata caching:
> {noformat}
> SELECT col FROM `A` WHERE dir0 = 'B`' AND dir1 IN ('1', '2', '3')
> {noformat}
> For such queries, Drill will read the metadata cache file from the top level
> directory 'A', which is not very efficient since we are only interested in
> the files from some subdirectories of 'B'. DRILL-4530 improves the
> performance of such queries when the leaf level directory is a single
> partition. Here, there are 3 subpartitions due to the IN list. We can
> build upon the DRILL-4530 enhancement by at least reading the cache file from
> the immediate parent level `/A/B` instead of the top level.
> The goal of this JIRA is to improve performance for such types of queries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)