[ 
https://issues.apache.org/jira/browse/HDFS-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124755#comment-16124755
 ] 

Yongjun Zhang commented on HDFS-12202:
--------------------------------------

Thanks a lot for your feedback [~chris.douglas].

Many thanks to [~andrew.wang] for a different proposal, for which I just 
created HDFS-12294/HDFS-12295 (some more subtasks such as httpfs/webhdfs and 
distcp need to be worked on). 

The idea is to encode the additional parameter proposed here to the file path 
itself (by adding a prefix to the path) before making the getFileStatus etc 
calls, and let NN to parse the path and decide whether the bypass the external 
attribute provider. Such that we don't have to change the FileSystem API. We 
need to manipulate the file path string at the server side for different cases. 
That certainly made the scope of change smaller.

Welcome to make comments/suggestions there.





> Provide new set of FileSystem API to bypass external attribute provider
> -----------------------------------------------------------------------
>
>                 Key: HDFS-12202
>                 URL: https://issues.apache.org/jira/browse/HDFS-12202
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs, hdfs-client
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>
> HDFS client uses 
> {code}
>   /**
>    * Return a file status object that represents the path.
>    * @param f The path we want information from
>    * @return a FileStatus object
>    * @throws FileNotFoundException when the path does not exist
>    * @throws IOException see specific implementation
>    */
>   public abstract FileStatus getFileStatus(Path f) throws IOException;
>   /**
>    * List the statuses of the files/directories in the given path if the path 
> is
>    * a directory.
>    * <p>
>    * Does not guarantee to return the List of files/directories status in a
>    * sorted order.
>    * <p>
>    * Will not return null. Expect IOException upon access error.
>    * @param f given path
>    * @return the statuses of the files/directories in the given patch
>    * @throws FileNotFoundException when the path does not exist
>    * @throws IOException see specific implementation
>    */
>   public abstract FileStatus[] listStatus(Path f) throws 
> FileNotFoundException,
>                                                          IOException;
> {code}
> to get FileStatus of files.
> When external attribute provider (INodeAttributeProvider) is enabled for a 
> cluster, the  external attribute provider is consulted to get back some 
> relevant info (including ACL, group etc) and returned back in FileStatus, 
> There is a problem here, when we use distcp to copy files from srcCluster to 
> tgtCluster, if srcCluster has external attribute provider enabled, the data 
> we copied would contain data from attribute provider, which we may not want.
> Create this jira to add a new set of interface for distcp to use, so that 
> distcp can copy HDFS data only and bypass external attribute provider data.
> The new set API would look like
> {code}
>  /**
>    * Return a file status object that represents the path.
>    * @param f The path we want information from
>    * @param bypassExtAttrProvider if true, bypass external attr provider
>    *        when it's in use.
>    * @return a FileStatus object
>    * @throws FileNotFoundException when the path does not exist
>    * @throws IOException see specific implementation
>    */
>   public FileStatus getFileStatus(Path f,
>       final boolean bypassExtAttrProvider) throws IOException;
>   /**
>    * List the statuses of the files/directories in the given path if the path 
> is
>    * a directory.
>    * <p>
>    * Does not guarantee to return the List of files/directories status in a
>    * sorted order.
>    * <p>
>    * Will not return null. Expect IOException upon access error.
>    * @param f
>    * @param bypassExtAttrProvider if true, bypass external attr provider
>    *        when it's in use.
>    * @return
>    * @throws FileNotFoundException
>    * @throws IOException
>    */
>   public FileStatus[] listStatus(Path f,
>       final boolean bypassExtAttrProvider) throws FileNotFoundException,
>                                                   IOException;
> {code}
> So when bypassExtAttrProvider is true, external attribute provider will be 
> bypassed.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to