[
https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148927#comment-15148927
]
ASF GitHub Bot commented on DRILL-4287:
---------------------------------------
Github user adeneche commented on a diff in the pull request:
https://github.com/apache/drill/pull/376#discussion_r53043516
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
---
@@ -529,6 +549,36 @@ public long getRowCount() {
}
+
+ // Create and return a new file selection based on reading the metadata
cache file.
+ // This function also initializes a few of ParquetGroupScan's fields as
appropriate.
+ private FileSelection
+ initFromMetadataCache(DrillFileSystem fs, FileSelection selection)
throws IOException {
+ FileStatus metaRootDir = selection.getFirstPath(fs);
+ Path metaFilePath = new Path(metaRootDir.getPath(),
Metadata.METADATA_FILENAME);
+
+ // get (and set internal field) the metadata for the directory by
reading the metadata file
+ this.parquetTableMetadata = Metadata.readBlockMeta(fs,
metaFilePath.toString());
+ List<String> fileNames = Lists.newArrayList();
+ for (Metadata.ParquetFileMetadata file :
parquetTableMetadata.getFiles()) {
+ fileNames.add(file.getPath());
+ }
+ // when creating the file selection, set the selection root in the
form /a/b instead of
+ // file:/a/b. The reason is that the file names above have been
created in the form
+ // /a/b/c.parquet and the format of the selection root must match that
of the file names
+ // otherwise downstream operations such as partition pruning can break.
+ final Path metaRootPath =
Path.getPathWithoutSchemeAndAuthority(metaRootDir.getPath());
+ this.selectionRoot = metaRootPath.toString();
+
+ // Use the FileSelection constructor directly here instead of the
FileSelection.create() method
+ // because create() changes the root to include the scheme and
authority; In future, if create()
+ // is the preferred way to instantiate a file selection, we may need
to do something different...
+ FileSelection newSelection = new
FileSelection(selection.getStatuses(fs), fileNames, metaRootPath.toString());
--- End diff --
Unfortunately, trying to fix this will introduce a performance regression,
see [DRILL-4380](https://issues.apache.org/jira/browse/DRILL-4380) for more
details.
> Do lazy reading of parquet metadata cache file
> ----------------------------------------------
>
> Key: DRILL-4287
> URL: https://issues.apache.org/jira/browse/DRILL-4287
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.4.0
> Reporter: Aman Sinha
> Assignee: Jinfeng Ni
>
> Currently, the parquet metadata cache file is read eagerly during creation of
> the DrillTable (as part of ParquetFormatMatcher.isReadable()). This is not
> desirable from performance standpoint since there are scenarios where we want
> to do some up-front optimizations - e.g. directory-based partition pruning
> (see DRILL-2517) or potential limit 0 optimization etc. - and in such
> situations it is better to do lazy reading of the metadata cache file.
> This is a placeholder to perform such delayed reading since it is needed for
> the aforementioned optimizations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)