[
https://issues.apache.org/jira/browse/IMPALA-12472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770366#comment-17770366
]
Zoltán Borók-Nagy commented on IMPALA-12472:
--------------------------------------------
Maybe both, i.e. a boolean flag for the warehouse and list of strings for any
external paths?
Users might always want to specify the warehouse location, and it's less
error-prone if they don't need to specify the warehouse location at multiple
places.
> Skip permission check when refreshing in event processor
> --------------------------------------------------------
>
> Key: IMPALA-12472
> URL: https://issues.apache.org/jira/browse/IMPALA-12472
> Project: IMPALA
> Issue Type: Improvement
> Components: Catalog
> Reporter: Csaba Ringhofer
> Priority: Major
>
> Saw callstacks where most of EventProcessor's time is spent in rechecking
> access level for partition directories
> {code}
> org.apache.impala.catalog.HdfsTable.getAvailableAccessLevel
> org.apache.impala.catalog.HdfsTable.createOrUpdatePartitionBuilder
> org.apache.impala.catalog.HdfsTable.createPartitionBuilder
> org.apache.impala.catalog.HdfsTable.reloadPartitions
> org.apache.impala.catalog.HdfsTable.reloadPartitionsFromNames
> org.apache.impala.service.CatalogOpExecutor.reloadPartitionsIfExisorg.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.reloadPartitions
> org.apache.impala.catalog.events.MetastoreEvents$BatchPartitionEvent.process
> {code}
> HdfsTable.getAvailableAccessLevel() does a getFileStatus(), and if access
> control list bit is set in the status, a getAclStatus() call to the namenode.
> It is questionable whether we should recheck this during refreshing tables
> for directories that were already checked in the past, as it can be expensive
> and is unlikely to change. AFAIK having stale data shouldn't cause security
> issues, as if Impala has no right to access/modify the file, the name node
> will simply not allow this operation (coordinators/executors use the same
> username as catalogd for HDFS ops).
> Note that the whole access level check is skipped for most other filesystems
> than HDFS (see HdfsTable.assumeReadWriteAccess()).
> Currently catalogd checks this for each partition level event (even if they
> are batched). While checking it once during CREATE PARTITON makes sense,
> rechecking it for every INSERT and ALTER seems like an overkill - especially
> an INSERT shouldn't reduce access rights on a partition table.
> Besides event processor, rechecking during REFRESH and reloads after
> DML/DDLs are also questionable. If there was an actual change, INVALIDATE
> METADATA can be used to reload the table from scratch.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]