[
https://issues.apache.org/jira/browse/DRILL-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139872#comment-15139872
]
ASF GitHub Bot commented on DRILL-4380:
---------------------------------------
Github user hnfgns commented on a diff in the pull request:
https://github.com/apache/drill/pull/369#discussion_r52383230
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java
---
@@ -233,7 +233,7 @@ private FileSelection expandSelection(DrillFileSystem
fs, FileSelection selectio
// /a/b/c.parquet and the format of the selection root must match
that of the file names
// otherwise downstream operations such as partition pruning can
break.
final Path metaRootPath =
Path.getPathWithoutSchemeAndAuthority(metaRootDir.getPath());
- final FileSelection newSelection = FileSelection.create(null,
fileNames, metaRootPath.toString());
+ final FileSelection newSelection = new
FileSelection(selection.getStatuses(fs), fileNames, metaRootPath.toString());
--- End diff --
Filed DRILL-4381. Thanks.
> Fix performance regression: in creation of FileSelection in
> ParquetFormatPlugin to not set files if metadata cache is available.
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-4380
> URL: https://issues.apache.org/jira/browse/DRILL-4380
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Parth Chandra
>
> The regression has been caused by the changes in
> 367d74a65ce2871a1452361cbd13bbd5f4a6cc95 (DRILL-2618: handle queries over
> empty folders consistently so that they report table not found rather than
> failing.)
> In ParquetFormatPlugin, the original code created a FileSelection object in
> the following code:
> {code}
> return new FileSelection(fileNames, metaRootPath.toString(), metadata,
> selection.getFileStatusList(fs));
> {code}
> The selection.getFileStatusList call made an inexpensive call to
> FileSelection.init(). The call was inexpensive because the
> FileSelection.files member was not set and the code does not need to make an
> expensive call to get the file statuses corresponding to the files in the
> FileSelection.files member.
> In the new code, this is replaced by
> {code}
> final FileSelection newSelection = FileSelection.create(null, fileNames,
> metaRootPath.toString());
> return ParquetFileSelection.create(newSelection, metadata);
> {code}
> This sets the FileSelection.files member but not the FileSelection.statuses
> member. A subsequent call to FileSelection.getStatuses ( in
> ParquetGroupScan() ) now makes an expensive call to get all the statuses.
> It appears that there was an implicit assumption that the
> FileSelection.statuses member should be set before the FileSelection.files
> member is set. This assumption is no longer true.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)