[
https://issues.apache.org/jira/browse/IMPALA-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17952788#comment-17952788
]
Quanlong Huang commented on IMPALA-4105:
----------------------------------------
The initial concern is about double loading for REFRESH on unloaded tables,
i.e. analyzing the statement triggers metadata loading and then the REFRESH
operation in catalogd side reloads the metadata again.
That's already true (unevitable) when Ranger authorization is enabled that we
need to load the table to get the ownership info for privilege checks
(IMPALA-8228). So Analyzer#dbContainsTable() already invokes getTable() which
triggers metadata loading after this change:
[https://gerrit.cloudera.org/c/14106/13/fe/src/main/java/org/apache/impala/analysis/Analyzer.java#2839]
To check whether there are Ranger column masking policies on the table for the
current user, we also load the table to get the column list (IMPALA-10554,
IMPALA-11281). Though we try to avoid double loading by IMPALA-11501, it still
happens in many production deployments since
allow_catalog_cache_op_from_masked_users is false by default (will change it in
IMPALA-13946).
I think double loading is still better than updating the table mutiple times
each for an individual partition. Currently, user has to use multiple partition
level REFRESH statements. Each acquires the table write lock and updates the
table, which invalidates some cache (e.g. partition list) in coordinator side.
The getPartialCatalogObject requests from coordinator side to re-fetch those
metadata could be blocked in acquiring the table read lock. They have
contention with the REFRESH statements which impacts the overall performance.
With this feature to REFRESH multiple partitions in one statement, the table
write lock just need to be acquired once. And the time is shorter since
partitions can be reloaded in parallel.
> General partition exprs in REFRESH statements.
> ----------------------------------------------
>
> Key: IMPALA-4105
> URL: https://issues.apache.org/jira/browse/IMPALA-4105
> Project: IMPALA
> Issue Type: New Feature
> Components: Frontend
> Affects Versions: Impala 2.8.0
> Reporter: Alexander Behm
> Assignee: Amos Bird
> Priority: Critical
> Labels: ramp-up
>
> IMPALA-1654 added support general partition exprs in most DDL operations.
> We should also support general partition exprs for REFRESH statements.
> Example:
> {code}
> REFRESH mytable PARTITION(year > 2015)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]