[
https://issues.apache.org/jira/browse/DRILL-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062151#comment-17062151
]
ASF GitHub Bot commented on DRILL-7330:
---------------------------------------
paul-rogers commented on pull request #2026: DRILL-7330: Implement metadata
usage for all format plugins
URL: https://github.com/apache/drill/pull/2026#discussion_r394695247
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/easy/EasyGroupScan.java
##########
@@ -90,17 +95,14 @@ public EasyGroupScan(
@JsonProperty("selectionRoot") Path selectionRoot,
@JsonProperty("schema") TupleMetadata schema
) throws IOException {
- super(ImpersonationUtil.resolveUserName(userName));
+ super(ImpersonationUtil.resolveUserName(userName), columns,
ValueExpressions.BooleanExpression.TRUE);
this.selection = FileSelection.create(null, files, selectionRoot);
this.formatPlugin = engineRegistry.resolveFormat(storageConfig,
formatConfig, EasyFormatPlugin.class);
this.columns = columns == null ? ALL_COLUMNS : columns;
this.selectionRoot = selectionRoot;
- SimpleFileTableMetadataProviderBuilder builder =
- (SimpleFileTableMetadataProviderBuilder)
- new FileSystemMetadataProviderManager()
-
.builder(MetadataProviderManager.MetadataProviderKind.SCHEMA_STATS_ONLY);
- this.metadataProvider = builder.withLocation(selection.getSelectionRoot())
+ this.metadataProvider = defaultTableMetadataProviderBuilder(new
FileSystemMetadataProviderManager())
Review comment:
This will be a huge problem in an actual production system. Fixing it is
beyond the scope of this PR. I would suggest that the team think a bit about
how this can work longer term. As noted, Impala struggled with this issue for
years, so it is not simple.
One answer is to know when directories change. Use cached metadata for
unchanged directories (which will be most of data history) and expand only
those that are "live" (i.e. partitions for the last day or two.)
Caching is essential for good performance. In the past, Drill was s-l-o-w
when reading the cached Parquet metadata.
But, again, let's leave this issue for another project.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Implement metadata usage for text format plugin
> -----------------------------------------------
>
> Key: DRILL-7330
> URL: https://issues.apache.org/jira/browse/DRILL-7330
> Project: Apache Drill
> Issue Type: Sub-task
> Reporter: Arina Ielchiieva
> Assignee: Vova Vysotskyi
> Priority: Major
> Fix For: 1.18.0
>
>
> 1. Change the current group scan to leverage Schema from Metastore;
> 2. Use stats for enabling additional logical planning rules for text format
> plugin. It will enable such optimizations as limit, filter push and so on.
> + add possibility to pass schema through schema file (using path or table
> root), inline.
> + check for other enhancements in analyze command
--
This message was sent by Atlassian Jira
(v8.3.4#803005)