[ 
https://issues.apache.org/jira/browse/IMPALA-12472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770366#comment-17770366
 ] 

Zoltán Borók-Nagy commented on IMPALA-12472:
--------------------------------------------

Maybe both, i.e. a boolean flag for the warehouse and list of strings for any 
external paths?

Users might always want to specify the warehouse location, and it's less 
error-prone if they don't need to specify the warehouse location at multiple 
places.

> Skip permission check when refreshing in event processor
> --------------------------------------------------------
>
>                 Key: IMPALA-12472
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12472
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Csaba Ringhofer
>            Priority: Major
>
> Saw callstacks where most of EventProcessor's time is spent in rechecking 
> access level for partition directories
> {code}
> org.apache.impala.catalog.HdfsTable.getAvailableAccessLevel
> org.apache.impala.catalog.HdfsTable.createOrUpdatePartitionBuilder
> org.apache.impala.catalog.HdfsTable.createPartitionBuilder
> org.apache.impala.catalog.HdfsTable.reloadPartitions
> org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames
> org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExisorg.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions
> org.apache.impala.catalog.events.MetastoreEvents$BatchPartitionEvent.process
> {code}
> HdfsTable.getAvailableAccessLevel() does a getFileStatus(), and if access 
> control list bit is set in the status, a getAclStatus() call to the namenode.
> It is questionable whether we should recheck this during refreshing tables 
> for directories that were already checked in the past, as it can be expensive 
> and is unlikely to change. AFAIK having stale data shouldn't cause security 
> issues, as if Impala has no right to access/modify the file, the name node 
> will simply not allow this operation (coordinators/executors use the same 
> username as catalogd for HDFS ops).
> Note that the whole access level check is skipped for most other filesystems 
> than HDFS (see HdfsTable.assumeReadWriteAccess()).
> Currently catalogd checks this for each partition level event (even if they 
> are batched). While checking it once during CREATE PARTITON makes sense, 
> rechecking it for every INSERT and ALTER seems like an overkill - especially 
> an INSERT shouldn't reduce access rights on a partition table.
> Besides event processor, rechecking during REFRESH and  reloads after 
> DML/DDLs are also questionable. If there was an actual change, INVALIDATE 
> METADATA can be used to reload the table from scratch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to