[
https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141583#comment-15141583
]
ASF GitHub Bot commented on DRILL-4287:
---------------------------------------
Github user adeneche commented on a diff in the pull request:
https://github.com/apache/drill/pull/345#discussion_r52519003
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java
---
@@ -521,6 +543,25 @@ public void setEndpointByteMap(EndpointByteMap
byteMap) {
}
}
+ private FileSelection
+ getSelectionFromMetadataCache(DrillFileSystem fs, FileSelection
selection) throws IOException {
+ FileStatus metaRootDir = selection.getFirstPath(fs);
+ Path metaFilePath = new Path(metaRootDir.getPath(),
Metadata.METADATA_FILENAME);
+
+ // get the metadata for the directory by reading the metadata file
+ Metadata.ParquetTableMetadataBase metadata =
Metadata.readBlockMeta(fs, metaFilePath.toString());
+ List<String> fileNames = Lists.newArrayList();
+ for (Metadata.ParquetFileMetadata file : metadata.getFiles()) {
+ fileNames.add(file.getPath());
+ }
+ // when creating the file selection, set the selection root in the
form /a/b instead of
+ // file:/a/b. The reason is that the file names above have been
created in the form
+ // /a/b/c.parquet and the format of the selection root must match that
of the file names
+ // otherwise downstream operations such as partition pruning can break.
+ final Path metaRootPath =
Path.getPathWithoutSchemeAndAuthority(metaRootDir.getPath());
+ return FileSelection.create(selection.getStatuses(fs), fileNames,
metaRootPath.toString());
--- End diff --
`FileSelection.create()` expects either a list of statuses or a list of
filenames, but not both.
> Do lazy reading of parquet metadata cache file
> ----------------------------------------------
>
> Key: DRILL-4287
> URL: https://issues.apache.org/jira/browse/DRILL-4287
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.4.0
> Reporter: Aman Sinha
> Assignee: Jinfeng Ni
>
> Currently, the parquet metadata cache file is read eagerly during creation of
> the DrillTable (as part of ParquetFormatMatcher.isReadable()). This is not
> desirable from performance standpoint since there are scenarios where we want
> to do some up-front optimizations - e.g. directory-based partition pruning
> (see DRILL-2517) or potential limit 0 optimization etc. - and in such
> situations it is better to do lazy reading of the metadata cache file.
> This is a placeholder to perform such delayed reading since it is needed for
> the aforementioned optimizations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)