[
https://issues.apache.org/jira/browse/DRILL-4589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229785#comment-15229785
]
ASF GitHub Bot commented on DRILL-4589:
---------------------------------------
Github user hsuanyi commented on a diff in the pull request:
https://github.com/apache/drill/pull/468#discussion_r58824977
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/planner/DFSDirPartitionLocation.java
---
@@ -0,0 +1,70 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Class defines a single partition corresponding to a directory in a DFS
table.
+ */
+package org.apache.drill.exec.planner;
+
+
+import com.google.common.collect.Lists;
+
+import java.util.Collection;
+import java.util.List;
+
+public class DFSDirPartitionLocation implements PartitionLocation {
+ private final Collection<PartitionLocation> subPartitions;
+ private final String[] dirs;
+
+ public DFSDirPartitionLocation(String[] dirs,
Collection<PartitionLocation> subPartitions) {
+ this.subPartitions = subPartitions;
+ this.dirs = dirs;
+ }
+
+ @Override
+ public String getPartitionValue(int index) {
+ assert index < dirs.length;
--- End diff --
I think the next line will throw IOOB if this line is not satisfied.
(But this is minor thing).
> Reduce planning time for file system partition pruning by reducing filter
> evaluation overhead
> ---------------------------------------------------------------------------------------------
>
> Key: DRILL-4589
> URL: https://issues.apache.org/jira/browse/DRILL-4589
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Reporter: Jinfeng Ni
> Assignee: Jinfeng Ni
>
> When Drill is used to query hundreds of thousands, or even millions of files
> organized into multi-level directories, user typically will provide a
> partition filter like : dir0 = something and dir1 = something2 and .. .
> For such queries, we saw the query planning time could be unacceptable long,
> due to three main overheads: 1) to expand and get the list of files, 2) to
> evaluate the partition filter, 3) to get the metadata, in the case of parquet
> files for which metadata cache file is not available.
> DRILL-2517 targets at the 3rd part of overhead. As a follow-up work after
> DRILL-2517, we plan to reduce the filter evaluation overhead. For now, the
> partition filter evaluation is applied to file level. In many cases, we saw
> that the number of leaf subdirectories is significantly lower than that of
> files. Since all the files under the same leaf subdirecctory share the same
> directory metadata, we should apply the filter evaluation at the leaf
> subdirectory. By doing that, we could reduce the cpu overhead to evaluate the
> filter, and the memory overhead as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)