[ 
https://issues.apache.org/jira/browse/HDFS-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16107791#comment-16107791
 ] 

Arun Suresh commented on HDFS-12202:
------------------------------------

[~yzhangal], Thanks for starting the discussion.
I understand the motivation for this, but I feel we should maybe explore ways 
around having to modify the HDFS API by configuring the External provider to 
return the underlying Attributes (and possibly bypass permission checks) for 
just a white-listed set of users (and/or a configured set of name-spaces) - 
this implies that performing distcp (without copying over the externally 
over-laid attributes) might be restricted to only a few users of the cluster - 
but from a practical standpoint, I think it should be reasonable, since I 
believe that for most clusters, this cluster-to-cluster copying does not happen 
very often and I usually performed by an cluster admin / manager. Thoughts ? 
(cc [~chris.douglas])

> Provide new set of FileSystem API to bypass external attribute provider
> -----------------------------------------------------------------------
>
>                 Key: HDFS-12202
>                 URL: https://issues.apache.org/jira/browse/HDFS-12202
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs, hdfs-client
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>
> HDFS client uses 
> {code}
>   /**
>    * Return a file status object that represents the path.
>    * @param f The path we want information from
>    * @return a FileStatus object
>    * @throws FileNotFoundException when the path does not exist
>    * @throws IOException see specific implementation
>    */
>   public abstract FileStatus getFileStatus(Path f) throws IOException;
>   /**
>    * List the statuses of the files/directories in the given path if the path 
> is
>    * a directory.
>    * <p>
>    * Does not guarantee to return the List of files/directories status in a
>    * sorted order.
>    * <p>
>    * Will not return null. Expect IOException upon access error.
>    * @param f given path
>    * @return the statuses of the files/directories in the given patch
>    * @throws FileNotFoundException when the path does not exist
>    * @throws IOException see specific implementation
>    */
>   public abstract FileStatus[] listStatus(Path f) throws 
> FileNotFoundException,
>                                                          IOException;
> {code}
> to get FileStatus of files.
> When external attribute provider (INodeAttributeProvider) is enabled for a 
> cluster, the  external attribute provider is consulted to get back some 
> relevant info (including ACL, group etc) and returned back in FileStatus, 
> There is a problem here, when we use distcp to copy files from srcCluster to 
> tgtCluster, if srcCluster has external attribute provider enabled, the data 
> we copied would contain data from attribute provider, which we may not want.
> Create this jira to add a new set of interface for distcp to use, so that 
> distcp can copy HDFS data only and bypass external attribute provider data.
> The new set API would look like
> {code}
>  /**
>    * Return a file status object that represents the path.
>    * @param f The path we want information from
>    * @param bypassExtAttrProvider if true, bypass external attr provider
>    *        when it's in use.
>    * @return a FileStatus object
>    * @throws FileNotFoundException when the path does not exist
>    * @throws IOException see specific implementation
>    */
>   public FileStatus getFileStatus(Path f,
>       final boolean bypassExtAttrProvider) throws IOException;
>   /**
>    * List the statuses of the files/directories in the given path if the path 
> is
>    * a directory.
>    * <p>
>    * Does not guarantee to return the List of files/directories status in a
>    * sorted order.
>    * <p>
>    * Will not return null. Expect IOException upon access error.
>    * @param f
>    * @param bypassExtAttrProvider if true, bypass external attr provider
>    *        when it's in use.
>    * @return
>    * @throws FileNotFoundException
>    * @throws IOException
>    */
>   public FileStatus[] listStatus(Path f,
>       final boolean bypassExtAttrProvider) throws FileNotFoundException,
>                                                   IOException;
> {code}
> So when bypassExtAttrProvider is true, external attribute provider will be 
> bypassed.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to