[
https://issues.apache.org/jira/browse/HDFS-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154337#comment-16154337
]
Chris Douglas commented on HDFS-12357:
--------------------------------------
Had offline discussions with [~yzhangal]. We tried a version that would bypass
not only the path component logic, but also add more generic filtering (by
{{INodesInPath}} and {{NodeAttributes}}). Unfortunately, the API is not always
invoked in contexts where this information is freely available.
Internally, the NameNode relies on null values for the
{{INodeAttributeProvider}} and {{AccessControlEnforcer}}; it constructs some
intermediate data to satisfy the plugin APIs. To extend v004/v005 to also avoid
these costs would not be as straightforward as the invocation in
{{FSDirectory}}. Fixing this across all providers- by pushing these conditions
ahead of the call- is a more significant refactor with implications for
existing implementations. [~yzhangal] cited experience in the field, where
copying jobs cause NN failover. We don't have specific data implicating the
costs we're avoiding here, but the more general solution has no willing
implementors, so we can press forward with v001b.
Someone more familiar with external attribute providers should
[verify|https://issues.apache.org/jira/browse/HDFS-12357?focusedCommentId=16151280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16151280]
the bypass of {{AccessControlEnforcer}} for the configured users.
> Let NameNode to bypass external attribute provider for special user
> -------------------------------------------------------------------
>
> Key: HDFS-12357
> URL: https://issues.apache.org/jira/browse/HDFS-12357
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Yongjun Zhang
> Assignee: Yongjun Zhang
> Attachments: HDFS-12357.001a.patch, HDFS-12357.001b.patch,
> HDFS-12357.001.patch, HDFS-12357.002.patch, HDFS-12357.003.patch,
> HDFS-12357.004.patch, HDFS-12357.005.patch
>
>
> This is a third proposal to solve the problem described in HDFS-12202.
> The problem is, when we do distcp from one cluster to another (or within the
> same cluster), in addition to copying file data, we copy the metadata from
> source to target. If external attribute provider is enabled, the metadata may
> be read from the provider, thus provider data read from source may be saved
> to target HDFS.
> We want to avoid saving metadata from external provider to HDFS, so we want
> to bypass external provider when doing the distcp (or hadoop fs -cp)
> operation.
> Two alternative approaches were proposed earlier, one in HDFS-12202, the
> other in HDFS-12294. The proposal here is the third one.
> The idea is, we introduce a new config, that specifies a special user (or a
> list of users), and let NN bypass external provider when the current user is
> a special user.
> If we run applications as the special user that need data from external
> attribute provider, then it won't work. So the constraint on this approach
> is, the special users here should not run applications that need data from
> external provider.
> Thanks [~asuresh] for proposing this idea and [~chris.douglas], [~daryn],
> [~manojg] for the discussions in the other jiras.
> I'm creating this one to discuss further.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]