[
https://issues.apache.org/jira/browse/DRILL-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139809#comment-15139809
]
ASF GitHub Bot commented on DRILL-4380:
---------------------------------------
Github user jacques-n commented on a diff in the pull request:
https://github.com/apache/drill/pull/369#discussion_r52376878
--- Diff:
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java
---
@@ -183,12 +194,16 @@ private static String buildPath(final String[] path,
final int folderIndex) {
}
public static FileSelection create(final DrillFileSystem fs, final
String parent, final String path) throws IOException {
+ Stopwatch timer = Stopwatch.createStarted();
final Path combined = new Path(parent, removeLeadingSlash(path));
final FileStatus[] statuses = fs.globStatus(combined);
if (statuses == null) {
return null;
}
- return create(Lists.newArrayList(statuses), null,
combined.toUri().toString());
+ final FileSelection fileSel = create(Lists.newArrayList(statuses),
null, combined.toUri().toString());
+ logger.info("FileSelection.create() took {} ms ",
timer.elapsed(TimeUnit.MILLISECONDS));
--- End diff --
INFO => DEBUG
> Fix performance regression: in creation of FileSelection in
> ParquetFormatPlugin to not set files if metadata cache is available.
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-4380
> URL: https://issues.apache.org/jira/browse/DRILL-4380
> Project: Apache Drill
> Issue Type: Bug
> Reporter: Parth Chandra
>
> The regression has been caused by the changes in
> 367d74a65ce2871a1452361cbd13bbd5f4a6cc95 (DRILL-2618: handle queries over
> empty folders consistently so that they report table not found rather than
> failing.)
> In ParquetFormatPlugin, the original code created a FileSelection object in
> the following code:
> {code}
> return new FileSelection(fileNames, metaRootPath.toString(), metadata,
> selection.getFileStatusList(fs));
> {code}
> The selection.getFileStatusList call made an inexpensive call to
> FileSelection.init(). The call was inexpensive because the
> FileSelection.files member was not set and the code does not need to make an
> expensive call to get the file statuses corresponding to the files in the
> FileSelection.files member.
> In the new code, this is replaced by
> {code}
> final FileSelection newSelection = FileSelection.create(null, fileNames,
> metaRootPath.toString());
> return ParquetFileSelection.create(newSelection, metadata);
> {code}
> This sets the FileSelection.files member but not the FileSelection.statuses
> member. A subsequent call to FileSelection.getStatuses ( in
> ParquetGroupScan() ) now makes an expensive call to get all the statuses.
> It appears that there was an implicit assumption that the
> FileSelection.statuses member should be set before the FileSelection.files
> member is set. This assumption is no longer true.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)