[
https://issues.apache.org/jira/browse/HDFS-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150921#comment-16150921
]
Manoj Govindassamy commented on HDFS-12357:
-------------------------------------------
Thanks for working on this [~yzhangal]. Thanks [~chris.douglas] for your review
and comments.
I believe the motive here is to strictly not return any of external provider
attributes for certain users. Tools like distcp can listFileStatus() as this
special user to get plain/standalone hdfs attributes which can then be _safely_
copied to a remote hdfs. We might not want tools like DistCp to copy external
attributes to HDFS.
Now, this knob/control for returning external attributes can either be given to
HDFS or the external provider. While having all the logics about returning the
right set of attributes at a single place, like the provider does sound like
very good idea, there is still a gap in the design. If I understand the problem
rightly, here the choice need to be given to HDFS whether to contact external
attributes provider or return the local default provider, so as to be totally
sure that right set of attributes are returned. May be this guarantee is not
established if the control is placed at the external provider.
> Let NameNode to bypass external attribute provider for special user
> -------------------------------------------------------------------
>
> Key: HDFS-12357
> URL: https://issues.apache.org/jira/browse/HDFS-12357
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Yongjun Zhang
> Assignee: Yongjun Zhang
> Attachments: HDFS-12357.001.patch
>
>
> This is a third proposal to solve the problem described in HDFS-12202.
> The problem is, when we do distcp from one cluster to another (or within the
> same cluster), in addition to copying file data, we copy the metadata from
> source to target. If external attribute provider is enabled, the metadata may
> be read from the provider, thus provider data read from source may be saved
> to target HDFS.
> We want to avoid saving metadata from external provider to HDFS, so we want
> to bypass external provider when doing the distcp (or hadoop fs -cp)
> operation.
> Two alternative approaches were proposed earlier, one in HDFS-12202, the
> other in HDFS-12294. The proposal here is the third one.
> The idea is, we introduce a new config, that specifies a special user (or a
> list of users), and let NN bypass external provider when the current user is
> a special user.
> If we run applications as the special user that need data from external
> attribute provider, then it won't work. So the constraint on this approach
> is, the special users here should not run applications that need data from
> external provider.
> Thanks [~asuresh] for proposing this idea and [~chris.douglas], [~daryn],
> [~manojg] for the discussions in the other jiras.
> I'm creating this one to discuss further.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]