paul-rogers commented on a change in pull request #2026: DRILL-7330: Implement 
metadata usage for all format plugins
URL: https://github.com/apache/drill/pull/2026#discussion_r394695247
 
 

 ##########
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyGroupScan.java
 ##########
 @@ -90,17 +95,14 @@ public EasyGroupScan(
       @JsonProperty("selectionRoot") Path selectionRoot,
       @JsonProperty("schema") TupleMetadata schema
       ) throws IOException {
-    super(ImpersonationUtil.resolveUserName(userName));
+    super(ImpersonationUtil.resolveUserName(userName), columns, 
ValueExpressions.BooleanExpression.TRUE);
     this.selection = FileSelection.create(null, files, selectionRoot);
     this.formatPlugin = engineRegistry.resolveFormat(storageConfig, 
formatConfig, EasyFormatPlugin.class);
     this.columns = columns == null ? ALL_COLUMNS : columns;
     this.selectionRoot = selectionRoot;
-    SimpleFileTableMetadataProviderBuilder builder =
-        (SimpleFileTableMetadataProviderBuilder)
-        new FileSystemMetadataProviderManager()
-        
.builder(MetadataProviderManager.MetadataProviderKind.SCHEMA_STATS_ONLY);
 
-    this.metadataProvider = builder.withLocation(selection.getSelectionRoot())
+    this.metadataProvider = defaultTableMetadataProviderBuilder(new 
FileSystemMetadataProviderManager())
 
 Review comment:
   This will be a huge problem in an actual production system. Fixing it is 
beyond the scope of this PR. I would suggest that the team think a bit about 
how this can work longer term. As noted, Impala struggled with this issue for 
years, so it is not simple.
   
   One answer is to know when directories change. Use cached metadata for 
unchanged directories (which will be most of data history) and expand only 
those that are "live" (i.e. partitions for the last day or two.)
   
   Caching is essential for good performance. In the past, Drill was s-l-o-w 
when reading the cached Parquet metadata.
   
   But, again, let's leave this issue for another project.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to