Csaba Ringhofer created IMPALA-12472:
----------------------------------------
Summary: Skip permission check when refreshing in event processor
Key: IMPALA-12472
URL: https://issues.apache.org/jira/browse/IMPALA-12472
Project: IMPALA
Issue Type: Improvement
Components: Catalog
Reporter: Csaba Ringhofer
Saw callstacks where most of EventProcessor's time is spent in rechecking
access level for partition directories
{code}
org.apache.impala.catalog.HdfsTable.getAvailableAccessLevel
org.apache.impala.catalog.HdfsTable.createOrUpdatePartitionBuilder
org.apache.impala.catalog.HdfsTable.createPartitionBuilder
org.apache.impala.catalog.HdfsTable.reloadPartitions
org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames
org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExisorg.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions
org.apache.impala.catalog.events.MetastoreEvents$BatchPartitionEvent.process
{code}
HdfsTable.getAvailableAccessLevel() does a getFileStatus(), and if access
control list bit is set in the status, a getAclStatus() call to the namenode.
It is questionable whether we should recheck this during refreshing tables for
directories that were already checked in the past, as it can be expensive and
is unlikely to change. AFAIK having stale data shouldn't cause security issues,
as if Impala has no right to access/modify the file, the name node will simply
not allow this operation (coordinators/executors use the same username as
catalogd for HDFS ops).
Note that the whole access level check is skipped for most other filesystems
than HDFS (see HdfsTable.assumeReadWriteAccess()).
Currently catalogd checks this for each partition level event (even if they are
batched). While checking it once during CREATE PARTITON makes sense, rechecking
it for every INSERT and ALTER seems like an overkill - especially an INSERT
shouldn't reduce access rights on a partition table.
Besides event processor, rechecking during REFRESH and reloads after DML/DDLs
are also questionable. If there was an actual change, INVALIDATE METADATA can
be used to reload the table from scratch.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)