[ 
https://issues.apache.org/jira/browse/HDFS-11058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634890#comment-15634890
 ] 

Andrew Wang commented on HDFS-11058:
------------------------------------

Thanks for the thoughtful response Manoj. I need to think about this some more, 
but a few ideas for discussion:

bq. IMHO, ViewFsMountPoint should be abstracted and expose only the needed 
attributes – the MountedOn path and its target FileSystem. The FileSystem could 
be a hdfs:// or it could be a one for MergeFs, but I don't see a need for 
exposing all the NameServices, at least for now.

I think the intent was to implement merging in ViewFileSystem itself, rather 
than a new FileSystem. So we'd need to return an array here, like in the 
original MountPoint.

Our user API for referring to a FileSystem is also by URI, not by object 
reference. Yes, the user can always call {{getUri}}, but there is global state 
in a FileSystem like file handles and statistics, and it might be better to not 
share that by handing out a FileSystem object which they can poke at. Also, 
since it looks like we allow mounting subdirectories, {{FileSystem#getUri}} by 
itself is underspecified without the path component.

Finally, what is the reason for using generics? getTargetFileSystem will always 
return a FileSystem right?

bq. <ViewFsUtil>

Not worth to separate out, though we should think about this some more.

As a semi-side note, I'm quite surprised that ViewFileSystem is annotated 
@Public. My impression from DistributedFileSystem is that the FileSystem 
subclasses are private, and are only used when casted as a FileSystem. This is 
why we have HdfsAdmin, which lets you do DFS-specific operations.

ViewFsUtil is similar to HdfsAdmin, but used to examine an already created 
ViewFileSystem instance. However, since {{getStatus}} takes a ViewFileSystem, 
it forces the user to downcast which is unfortunate. Instead, we could have an 
{{isViewFileSystem}} API, and having {{getStatus}} take a {{FileSystem}} and 
throwing UnsupportedOperation if the passed FS is not a VFS.

Finally, we should probably also name this {{ViewFileSystemUtil}} since 
{{ViewFs}} is the FileContext implementation.

bq.  I have seen the unix df command getting stuck at times when NFS servers 
are not reachable. But, I am totally ok to remove this extra feature and error 
out when any of the backing NameServices are not reachable.

Good point, I've seen similar behavior as well. Let's tackle this in a separate 
JIRA though, and maybe put the behavior behind a flag. I do think we should 
return non-zero in this case, and think about how scripts will be able to parse 
the output.

> Implement 'hadoop fs -df' command for ViewFileSystem   
> -------------------------------------------------------
>
>                 Key: HDFS-11058
>                 URL: https://issues.apache.org/jira/browse/HDFS-11058
>             Project: Hadoop HDFS
>          Issue Type: Task
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>              Labels: viewfs
>         Attachments: HDFS-11058.01.patch
>
>
> Df command doesn't seem to work well with ViewFileSystem. It always reports 
> used data as 0. Here is the client mount table configuration I am using 
> against a federated clusters of 2 NameNodes and 2 DataNoes. 
> {code}
>   1 <?xml version="1.0" ?>
>   2 <configuration>
>   3   <property>
>   4     <name>fs.defaultFS</name>
>   5     <value>viewfs://ClusterX/</value>
>   6   </property>
>   ..
>  11   <property>
>  12     <name>fs.default.name</name>
>  13     <value>viewfs://ClusterX/</value>
>  14   </property>
>  ..
>  23   <property>
>  24     <name>fs.viewfs.mounttable.ClusterX.link./nn0</name>
>  25     <value>hdfs://127.0.0.1:50001/</value>
>  26   </property>
>  27   <property>
>  28     <name>fs.viewfs.mounttable.ClusterX.link./nn1</name>
>  29     <value>hdfs://127.0.0.1:51001/</value>
>  30   </property>
>  31   <property>
>  32     <name>fs.viewfs.mounttable.ClusterX.link./nn2</name>
>  33     <value>hdfs://127.0.0.1:52001/nn2</value>
>  34   </property>
>  35   <property>
>  36     <name>fs.viewfs.mounttable.ClusterX.link./nn3</name>
>  37     <value>hdfs://127.0.0.1:52001/nn3</value>
>  38   </property>
>  39   <property>
>  40     <name>fs.viewfs.mounttable.ClusterY.linkMergeSlash</name>
>  41     <value>hdfs://127.0.0.1:50001/</value>
>  42   </property>
>  43 </configuration>
> {code}
> {{Df}} command always reports Size/Available as 8.0E and the usage as 0 for 
> any federated cluster. 
> {noformat}
> # hadoop fs -fs viewfs://ClusterX/ -df  /
> Filesystem                         Size  Used            Available  Use%
> viewfs://ClusterX/  9223372036854775807     0  9223372036854775807    0%
> # hadoop fs -fs viewfs://ClusterX/ -df  -h /
> Filesystem           Size  Used  Available  Use%
> viewfs://ClusterX/  8.0 E     0      8.0 E    0%
> # hadoop fs -fs viewfs://ClusterY/ -df  -h /
> Filesystem           Size  Used  Available  Use%
> viewfs://ClusterY/  8.0 E     0      8.0 E    0%
> {noformat}
> Whereas {{Du}} command seems to work as expected even with ViewFileSystem.
> {noformat}
> # hadoop fs -fs viewfs://ClusterY/ -du -h /
> 10.6 K  31.8 K  /build.log.16y
> 0       0       /user
> # hadoop fs -fs viewfs://ClusterX/ -du -h /
> 10.6 K  31.8 K  /nn0
> 0       0       /nn1
> 20.2 K  35.8 K  /nn3
> 40.6 K  34.3 K  /nn4
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to