[
https://issues.apache.org/jira/browse/DRILL-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139805#comment-15139805
]
ASF GitHub Bot commented on DRILL-4380:
---------------------------------------
Github user jacques-n commented on a diff in the pull request:
https://github.com/apache/drill/pull/369#discussion_r52376432
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java
---
@@ -233,7 +233,7 @@ private FileSelection expandSelection(DrillFileSystem
fs, FileSelection selectio
// /a/b/c.parquet and the format of the selection root must match
that of the file names
// otherwise downstream operations such as partition pruning can
break.
final Path metaRootPath =
Path.getPathWithoutSchemeAndAuthority(metaRootDir.getPath());
- final FileSelection newSelection = FileSelection.create(null,
fileNames, metaRootPath.toString());
+ final FileSelection newSelection = new
FileSelection(selection.getStatuses(fs), fileNames, metaRootPath.toString());
--- End diff --
It seems like we keep having issues with misuse of this interface which
causes planning regressions. Do you think it makes sense to either change the
api or add additional comments to make sure people aren't doing the wrong thing?
> Fix performance regression: in creation of FileSelection in
> ParquetFormatPlugin to not set files if metadata cache is available.
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-4380
> URL: https://issues.apache.org/jira/browse/DRILL-4380
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Parth Chandra
>
> The regression has been caused by the changes in
> 367d74a65ce2871a1452361cbd13bbd5f4a6cc95 (DRILL-2618: handle queries over
> empty folders consistently so that they report table not found rather than
> failing.)
> In ParquetFormatPlugin, the original code created a FileSelection object in
> the following code:
> {code}
> return new FileSelection(fileNames, metaRootPath.toString(), metadata,
> selection.getFileStatusList(fs));
> {code}
> The selection.getFileStatusList call made an inexpensive call to
> FileSelection.init(). The call was inexpensive because the
> FileSelection.files member was not set and the code does not need to make an
> expensive call to get the file statuses corresponding to the files in the
> FileSelection.files member.
> In the new code, this is replaced by
> {code}
> final FileSelection newSelection = FileSelection.create(null, fileNames,
> metaRootPath.toString());
> return ParquetFileSelection.create(newSelection, metadata);
> {code}
> This sets the FileSelection.files member but not the FileSelection.statuses
> member. A subsequent call to FileSelection.getStatuses ( in
> ParquetGroupScan() ) now makes an expensive call to get all the statuses.
> It appears that there was an implicit assumption that the
> FileSelection.statuses member should be set before the FileSelection.files
> member is set. This assumption is no longer true.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)