[ 
https://issues.apache.org/jira/browse/DRILL-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567610#comment-16567610
 ] 

ASF GitHub Bot commented on DRILL-6640:
---------------------------------------

HanumathRao commented on a change in pull request #1405: DRILL-6640: Modifying 
DotDrillUtil implementation to avoid using globStatus calls
URL: https://github.com/apache/drill/pull/1405#discussion_r207409509
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/dotdrill/DotDrillUtil.java
 ##########
 @@ -48,16 +59,70 @@
     }
     return files;
   }
-
+  /**
+   * Return list of DotDrillFile objects whose file name ends with .drill and 
matches the provided Drill Dot files types
+   * in a given parent Path.
+   * Return an empty list if no files matches the given Dot Drill File Types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param types Dot Drill Types to be matched
+   * @return List of matched DotDrillFile objects
+   * @throws IOException
+   */
   public static List<DotDrillFile> getDotDrills(DrillFileSystem fs, Path root, 
DotDrillType... types) throws IOException{
-    return getDrillFiles(fs, fs.globStatus(new Path(root, "*.drill")), types);
+    return getDrillFiles(fs,  getDrillFileStatus(fs,root,"*.drill",types) , 
types);
   }
 
+  /**
+   * Return list of DotDrillFile objects whose file name matches the provided 
name pattern and Drill Dot files types
+   * in a given parent Path.
+   * Return an empty list if no files matches the given file name and Dot 
Drill File Types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param name name/pattern of the file
+   * @param types Dot Drill Types to be matched
+   * @return List of matched DotDrillFile objects
+   * @throws IOException
+   */
   public static List<DotDrillFile> getDotDrills(DrillFileSystem fs, Path root, 
String name, DotDrillType... types) throws IOException{
-    if(!name.endsWith(".drill")) {
-      name = name + DotDrillType.DOT_DRILL_GLOB;
-    }
+   return getDrillFiles(fs, getDrillFileStatus(fs,root,name,types), types);
+  }
 
-    return getDrillFiles(fs, fs.globStatus(new Path(root, name)), types);
+  /**
+   * Return list of FileStatus objects whose file name matches the provided 
name pattern and Drill Dot file types
+   * in a given parent Path.
+   * Return an empty list if no files matches the pattern and Drill Dot file 
types.
+   * @param fs DrillFileSystem instance
+   * @param root parent Path
+   * @param name name/pattern of the file
+   * @param types Dot Drill Types to be matched
+   * @return List of FileStatuses for files matching name and  Drill Dot file 
types.
+   * @throws IOException  if any I/O error occurs when fetching file status
+   */
+  private static List<FileStatus> getDrillFileStatus(DrillFileSystem fs, Path 
root, String name, DotDrillType... types) throws IOException{
+    List<FileStatus> statusList = new ArrayList<FileStatus>();
+
+    if (name.endsWith(".drill")) {
+      FileStatus[] status = fs.globStatus(new Path(root, name));
 
 Review comment:
   In this case does it mean that types should not matter? If so is it good to 
have some assert.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Drill takes long time in planning when there are large number of files in  
> views/tables DFS parent directory
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6640
>                 URL: https://issues.apache.org/jira/browse/DRILL-6640
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning &amp; Optimization
>            Reporter: Arjun
>            Assignee: Arjun
>            Priority: Major
>             Fix For: 1.15.0
>
>
> When Drill is usedĀ for querying views/ tables, the query planning time 
> increases as the number of files in views/tables parent directory increases. 
> This becomes unacceptably long with complex queries.
> This is caused by globStatus operation on view files using GLOB to retrieve 
> view file status. This can be improved by avoiding the usage of GLOB pattern 
> for Drill metadata files like view files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to