[ 
https://issues.apache.org/jira/browse/IMPALA-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17952788#comment-17952788
 ] 

Quanlong Huang commented on IMPALA-4105:
----------------------------------------

The initial concern is about double loading for REFRESH on unloaded tables, 
i.e. analyzing the statement triggers metadata loading and then the REFRESH 
operation in catalogd side reloads the metadata again.

That's already true (unevitable) when Ranger authorization is enabled that we 
need to load the table to get the ownership info for privilege checks 
(IMPALA-8228). So Analyzer#dbContainsTable() already invokes getTable() which 
triggers metadata loading after this change:
[https://gerrit.cloudera.org/c/14106/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#2839]

To check whether there are Ranger column masking policies on the table for the 
current user, we also load the table to get the column list (IMPALA-10554, 
IMPALA-11281). Though we try to avoid double loading by IMPALA-11501, it still 
happens in many production deployments since 
allow_catalog_cache_op_from_masked_users is false by default (will change it in 
IMPALA-13946).

I think double loading is still better than updating the table mutiple times 
each for an individual partition. Currently, user has to use multiple partition 
level REFRESH statements. Each acquires the table write lock and updates the 
table, which invalidates some cache (e.g. partition list) in coordinator side. 
The getPartialCatalogObject requests from coordinator side to re-fetch those 
metadata could be blocked in acquiring the table read lock. They have 
contention with the REFRESH statements which impacts the overall performance.

With this feature to REFRESH multiple partitions in one statement, the table 
write lock just need to be acquired once. And the time is shorter since 
partitions can be reloaded in parallel.

> General partition exprs in REFRESH statements.
> ----------------------------------------------
>
>                 Key: IMPALA-4105
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4105
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>    Affects Versions: Impala 2.8.0
>            Reporter: Alexander Behm
>            Assignee: Amos Bird
>            Priority: Critical
>              Labels: ramp-up
>
> IMPALA-1654 added support general partition exprs in most DDL operations.
> We should also support general partition exprs for REFRESH statements.
> Example:
> {code}
> REFRESH mytable PARTITION(year > 2015)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to