[ 
https://issues.apache.org/jira/browse/IMPALA-12929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-12929.
-------------------------------------
    Target Version: Impala 4.4.0
        Resolution: Fixed

Resolved in the following commit:

commit 0d49c9d6cc7fc0903d60a78d8aaa996af0249c06
Author: stiga-huang <[email protected]>
Date:   Thu Mar 21 14:37:33 2024 +0800

    IMPALA-12929: Skip loading HDFS permissions in local-catalog mode
    
    HDFS file/dir permissions are not used at all in local catalog mode - in
    LocalFsTable, hasWriteAccessToBaseDir() always returns true and
    getFirstLocationWithoutWriteAccess() always returns null.
    
    However, in catalogd, we still load them (in single thread for a table!)
    which could dominant the table loading time when there are lots of
    partitions. Note that the table loading process in catalogd is the same
    no matter what catalog mode is in used. The difference between catalog
    modes is mainly in how coordinators get metadata from catalogd. Local
    catalog mode is turned on by setting --catalog_topic_mode=minimal on
    catalogd and --use_local_catalog=true on coordinators.
    
    This patch skips loading HDFS permissions on catalogd when running in
    local catalog mode. We can revisit it in IMPALA-7539.
    
    Tests:
     - Ran CORE tests
    
    Change-Id: I5baa9f6ab0d3888a78ff161ae5caa19e85bc983a
    Reviewed-on: http://gerrit.cloudera.org:8080/21178
    Reviewed-by: Impala Public Jenkins <[email protected]>
    Tested-by: Impala Public Jenkins <[email protected]>

> Skip loading HDFS permissions in local-catalog mode
> ---------------------------------------------------
>
>                 Key: IMPALA-12929
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12929
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> As IMPALA-7539 is unresolved, HDFS file/dir permissions are not used at all 
> in local catalog mode. However, in catalogd, we still load them (in single 
> thread for a table!) which could dominant the table loading time when there 
> are lots of partitions. Here is an example timeline for a REFRESH statement 
> on an unloaded table:
> {noformat}
> Catalog Server Operation: 2s300ms
>    - Got catalog version read lock: 26.407us (26.407us)
>    - Start loading table: 314.663us (288.256us)
>    - Got Metastore client: 629.599us (314.936us)
>    - Fetched table from Metastore: 7.248ms (6.618ms)
>    - Loaded table schema: 27.947ms (20.699ms)
>    - Preloaded permissions cache for 1824 partitions: 1s514ms (1s486ms)
>    - Got access level: 1s514ms (588.314us)
>    - Created partition builders: 2s103ms (588.270ms)
>    - Start loading file metadata: 2s103ms (49.760us)
>    - Loaded file metadata for 1824 partitions: 2s282ms (179.839ms)
>    - Async loaded table: 2s289ms (6.931ms)
>    - Loaded table from scratch: 2s289ms (72.038us)
>    - Got table read lock: 2s289ms (2.289us)
>    - Finished resetMetadata request: 2s300ms (10.188ms){noformat}
> The majority of the time is spent in "Preloaded permissions cache for 1824 
> partitions".
> Currently, catalogd can skip loading HDFS permissions in local catalog mode 
> until IMPALA-7539 is unresolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to