Parth Chandra created DRILL-4380:
------------------------------------

             Summary: Fix performance regression: in creation of FileSelection 
in ParquetFormatPlugin to not set files if metadata cache is available.
                 Key: DRILL-4380
                 URL: https://issues.apache.org/jira/browse/DRILL-4380
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Parth Chandra



The regression has been caused by the changes in 
367d74a65ce2871a1452361cbd13bbd5f4a6cc95 (DRILL-2618: handle queries over empty 
folders consistently so that they report table not found rather than failing.)

In ParquetFormatPlugin, the original code created a FileSelection object in the 
following code:
{code}
return new FileSelection(fileNames, metaRootPath.toString(), metadata, 
selection.getFileStatusList(fs));
{code}
The selection.getFileStatusList call made an inexpensive call to 
FileSelection.init(). The call was inexpensive because the FileSelection.files 
member was not set and the code does not need to make an expensive call to get 
the file statuses corresponding to the files in the FileSelection.files member.
In the new code, this is replaced by 
{code}
  final FileSelection newSelection = FileSelection.create(null, fileNames, 
metaRootPath.toString());
        return ParquetFileSelection.create(newSelection, metadata);
{code}
This sets the FileSelection.files member but not the FileSelection.statuses 
member. A subsequent call to FileSelection.getStatuses ( in ParquetGroupScan() 
) now makes an expensive call to get all the statuses.

It appears that there was an implicit assumption that the 
FileSelection.statuses member should be set before the FileSelection.files 
member is set. This assumption is no longer true.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to