[ 
https://issues.apache.org/jira/browse/IMPALA-12472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770356#comment-17770356
 ] 

Csaba Ringhofer commented on IMPALA-12472:
------------------------------------------

A related comment:
https://issues.apache.org/jira/browse/IMPALA-7539?focusedCommentId=16876527&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16876527

It suggests to completely skip permission check with the warehouse directory. 
Adding a flag for this would be simple and could be a huge performance 
improvement:
HdfsTable checks here whether to assume read+write access on a given filesystem 
or access checks are needed. We could also pass the path to check whether it is 
a subdiractory of a path where we assume read+write access.

I would consider creating a flag that can have a list of paths instead of 
having a bool on whether to skip on the warehouse. This would allow more fine 
grained control, e.g. assuming some external locations as having read+write 
access.

> Skip permission check when refreshing in event processor
> --------------------------------------------------------
>
>                 Key: IMPALA-12472
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12472
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Csaba Ringhofer
>            Priority: Major
>
> Saw callstacks where most of EventProcessor's time is spent in rechecking 
> access level for partition directories
> {code}
> org.apache.impala.catalog.HdfsTable.getAvailableAccessLevel
> org.apache.impala.catalog.HdfsTable.createOrUpdatePartitionBuilder
> org.apache.impala.catalog.HdfsTable.createPartitionBuilder
> org.apache.impala.catalog.HdfsTable.reloadPartitions
> org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames
> org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExisorg.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions
> org.apache.impala.catalog.events.MetastoreEvents$BatchPartitionEvent.process
> {code}
> HdfsTable.getAvailableAccessLevel() does a getFileStatus(), and if access 
> control list bit is set in the status, a getAclStatus() call to the namenode.
> It is questionable whether we should recheck this during refreshing tables 
> for directories that were already checked in the past, as it can be expensive 
> and is unlikely to change. AFAIK having stale data shouldn't cause security 
> issues, as if Impala has no right to access/modify the file, the name node 
> will simply not allow this operation (coordinators/executors use the same 
> username as catalogd for HDFS ops).
> Note that the whole access level check is skipped for most other filesystems 
> than HDFS (see HdfsTable.assumeReadWriteAccess()).
> Currently catalogd checks this for each partition level event (even if they 
> are batched). While checking it once during CREATE PARTITON makes sense, 
> rechecking it for every INSERT and ALTER seems like an overkill - especially 
> an INSERT shouldn't reduce access rights on a partition table.
> Besides event processor, rechecking during REFRESH and  reloads after 
> DML/DDLs are also questionable. If there was an actual change, INVALIDATE 
> METADATA can be used to reload the table from scratch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to