[
https://issues.apache.org/jira/browse/IMPALA-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-7320.
-----------------------------------
Resolution: Fixed
> Loading HDFS tables calls getFileStatus on each partition serially
> ------------------------------------------------------------------
>
> Key: IMPALA-7320
> URL: https://issues.apache.org/jira/browse/IMPALA-7320
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Affects Versions: Impala 3.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Major
>
> The catalog caches the access level (permissions) of each of the partitions
> in an HDFS table. This is all loaded when the table is first loaded, and is
> done so by making serial calls to getFileStatus() on each of the partitions.
> In most case, all of the partitions are in a single directory and we could
> get all of the information through a single call to listFileStatus() on the
> parent. In my testing, a typical getFileStatus call took 1-2 milliseconds, so
> on a large table with tens of thousands of partitions this can shave many
> seconds off of the table load time as well as reduce load on the NN.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]