[
https://issues.apache.org/jira/browse/IMPALA-12929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang resolved IMPALA-12929.
-------------------------------------
Target Version: Impala 4.4.0
Resolution: Fixed
Resolved in the following commit:
commit 0d49c9d6cc7fc0903d60a78d8aaa996af0249c06
Author: stiga-huang <[email protected]>
Date: Thu Mar 21 14:37:33 2024 +0800
IMPALA-12929: Skip loading HDFS permissions in local-catalog mode
HDFS file/dir permissions are not used at all in local catalog mode - in
LocalFsTable, hasWriteAccessToBaseDir() always returns true and
getFirstLocationWithoutWriteAccess() always returns null.
However, in catalogd, we still load them (in single thread for a table!)
which could dominant the table loading time when there are lots of
partitions. Note that the table loading process in catalogd is the same
no matter what catalog mode is in used. The difference between catalog
modes is mainly in how coordinators get metadata from catalogd. Local
catalog mode is turned on by setting --catalog_topic_mode=minimal on
catalogd and --use_local_catalog=true on coordinators.
This patch skips loading HDFS permissions on catalogd when running in
local catalog mode. We can revisit it in IMPALA-7539.
Tests:
- Ran CORE tests
Change-Id: I5baa9f6ab0d3888a78ff161ae5caa19e85bc983a
Reviewed-on: http://gerrit.cloudera.org:8080/21178
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Skip loading HDFS permissions in local-catalog mode
> ---------------------------------------------------
>
> Key: IMPALA-12929
> URL: https://issues.apache.org/jira/browse/IMPALA-12929
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
>
> As IMPALA-7539 is unresolved, HDFS file/dir permissions are not used at all
> in local catalog mode. However, in catalogd, we still load them (in single
> thread for a table!) which could dominant the table loading time when there
> are lots of partitions. Here is an example timeline for a REFRESH statement
> on an unloaded table:
> {noformat}
> Catalog Server Operation: 2s300ms
> - Got catalog version read lock: 26.407us (26.407us)
> - Start loading table: 314.663us (288.256us)
> - Got Metastore client: 629.599us (314.936us)
> - Fetched table from Metastore: 7.248ms (6.618ms)
> - Loaded table schema: 27.947ms (20.699ms)
> - Preloaded permissions cache for 1824 partitions: 1s514ms (1s486ms)
> - Got access level: 1s514ms (588.314us)
> - Created partition builders: 2s103ms (588.270ms)
> - Start loading file metadata: 2s103ms (49.760us)
> - Loaded file metadata for 1824 partitions: 2s282ms (179.839ms)
> - Async loaded table: 2s289ms (6.931ms)
> - Loaded table from scratch: 2s289ms (72.038us)
> - Got table read lock: 2s289ms (2.289us)
> - Finished resetMetadata request: 2s300ms (10.188ms){noformat}
> The majority of the time is spent in "Preloaded permissions cache for 1824
> partitions".
> Currently, catalogd can skip loading HDFS permissions in local catalog mode
> until IMPALA-7539 is unresolved.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]