Csaba Ringhofer created IMPALA-12472:
----------------------------------------

             Summary: Skip permission check when refreshing in event processor
                 Key: IMPALA-12472
                 URL: https://issues.apache.org/jira/browse/IMPALA-12472
             Project: IMPALA
          Issue Type: Improvement
          Components: Catalog
            Reporter: Csaba Ringhofer


Saw callstacks where most of EventProcessor's time is spent in rechecking 
access level for partition directories
{code}
org.apache.impala.catalog.HdfsTable.getAvailableAccessLevel
org.apache.impala.catalog.HdfsTable.createOrUpdatePartitionBuilder
org.apache.impala.catalog.HdfsTable.createPartitionBuilder
org.apache.impala.catalog.HdfsTable.reloadPartitions
org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames
org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExisorg.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions
org.apache.impala.catalog.events.MetastoreEvents$BatchPartitionEvent.process
{code}

HdfsTable.getAvailableAccessLevel() does a getFileStatus(), and if access 
control list bit is set in the status, a getAclStatus() call to the namenode.

It is questionable whether we should recheck this during refreshing tables for 
directories that were already checked in the past, as it can be expensive and 
is unlikely to change. AFAIK having stale data shouldn't cause security issues, 
as if Impala has no right to access/modify the file, the name node will simply 
not allow this operation (coordinators/executors use the same username as 
catalogd for HDFS ops).
Note that the whole access level check is skipped for most other filesystems 
than HDFS (see HdfsTable.assumeReadWriteAccess()).

Currently catalogd checks this for each partition level event (even if they are 
batched). While checking it once during CREATE PARTITON makes sense, rechecking 
it for every INSERT and ALTER seems like an overkill - especially an INSERT 
shouldn't reduce access rights on a partition table.

Besides event processor, rechecking during REFRESH and  reloads after DML/DDLs 
are also questionable. If there was an actual change, INVALIDATE METADATA can 
be used to reload the table from scratch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to