vvysotskyi commented on a change in pull request #2026: DRILL-7330: Implement metadata usage for all format plugins URL: https://github.com/apache/drill/pull/2026#discussion_r392722720
########## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyGroupScan.java ########## @@ -124,13 +127,16 @@ public EasyGroupScan( // use file system metadata provider without specified schema and statistics metadataProviderManager = new FileSystemMetadataProviderManager(); } - SimpleFileTableMetadataProviderBuilder builder = - (SimpleFileTableMetadataProviderBuilder) metadataProviderManager.builder( - MetadataProviderManager.MetadataProviderKind.SCHEMA_STATS_ONLY); + DrillFileSystem fs = + ImpersonationUtil.createFileSystem(ImpersonationUtil.resolveUserName(userName), formatPlugin.getFsConf()); - this.metadataProvider = builder.withLocation(selection.getSelectionRoot()) + this.metadataProvider = tableMetadataProviderBuilder(metadataProviderManager) + .withSelection(selection) + .withFileSystem(fs) .build(); + this.usedMetastore = metadataProviderManager.usesMetastore(); initFromSelection(selection, formatPlugin); + checkMetadataConsistency(selection, formatPlugin.getFsConf()); Review comment: Using the metadata, we can prune files that shouldn't be read, or even drop the filters which always true, or drop unnecessary files for limit queries. This check is one of the fees for these optimizations, but I believe with them we can significantly improve query runtime. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services