Quanlong Huang created IMPALA-10272:
---------------------------------------
Summary: LOAD DATA should respect Ranger-HDFS policies
Key: IMPALA-10272
URL: https://issues.apache.org/jira/browse/IMPALA-10272
Project: IMPALA
Issue Type: Bug
Reporter: Quanlong Huang
[~thundergun] reported an issue that analyzing a LOAD DATA statement fails in
checking access to the source file while a Ranger HDFS policy actually exists
to allow the access. Impala only loads the permissions from HDFS and check
accesses by itself. Related codes:
https://github.com/apache/impala/blob/ee4043e1a0940ae5711c68336d1ad522631d0e35/fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java#L195-L206
When Ranger authorization is enabled, this could be wrong if the HDFS
permissions is more restrict than the Ranger policies. According to the Ranger
document:
[https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=57901344#RangerUserGuide(workinprogress)-HDFSPolicycreation]
{quote}when the NameNode receives a user request, the Ranger Plugin checks for
policies set through the Ranger Policy Manager. Then, if there are no policies
authorizing the request, the Ranger plugin checks for permissions set in HDFS.
{quote}
We currently don't have an embeded ranger-hdfs plugin to check this locally. I
think we can check the access using {{FileSystem#access(Path path, FsAction
mode)}} to invoke a NameNode RPC as a quick fix for this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]