[ 
https://issues.apache.org/jira/browse/IMPALA-12472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769951#comment-17769951
 ] 

Csaba Ringhofer commented on IMPALA-12472:
------------------------------------------

The calltsack above may be dominant in the jstacks for event processor because 
it happens on a single thread on all partitions, while file listing for 
partitions happens on multiple thread. It may be also beneficial to keep the 
access level check but move it to the multithreaded part if FsPermissionCache 
is not preloaded.

> Skip permission check when refreshing in event processor
> --------------------------------------------------------
>
>                 Key: IMPALA-12472
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12472
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Csaba Ringhofer
>            Priority: Major
>
> Saw callstacks where most of EventProcessor's time is spent in rechecking 
> access level for partition directories
> {code}
> org.apache.impala.catalog.HdfsTable.getAvailableAccessLevel
> org.apache.impala.catalog.HdfsTable.createOrUpdatePartitionBuilder
> org.apache.impala.catalog.HdfsTable.createPartitionBuilder
> org.apache.impala.catalog.HdfsTable.reloadPartitions
> org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames
> org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExisorg.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions
> org.apache.impala.catalog.events.MetastoreEvents$BatchPartitionEvent.process
> {code}
> HdfsTable.getAvailableAccessLevel() does a getFileStatus(), and if access 
> control list bit is set in the status, a getAclStatus() call to the namenode.
> It is questionable whether we should recheck this during refreshing tables 
> for directories that were already checked in the past, as it can be expensive 
> and is unlikely to change. AFAIK having stale data shouldn't cause security 
> issues, as if Impala has no right to access/modify the file, the name node 
> will simply not allow this operation (coordinators/executors use the same 
> username as catalogd for HDFS ops).
> Note that the whole access level check is skipped for most other filesystems 
> than HDFS (see HdfsTable.assumeReadWriteAccess()).
> Currently catalogd checks this for each partition level event (even if they 
> are batched). While checking it once during CREATE PARTITON makes sense, 
> rechecking it for every INSERT and ALTER seems like an overkill - especially 
> an INSERT shouldn't reduce access rights on a partition table.
> Besides event processor, rechecking during REFRESH and  reloads after 
> DML/DDLs are also questionable. If there was an actual change, INVALIDATE 
> METADATA can be used to reload the table from scratch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to