[ https://issues.apache.org/jira/browse/HDFS-11058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634890#comment-15634890 ]
Andrew Wang commented on HDFS-11058: ------------------------------------ Thanks for the thoughtful response Manoj. I need to think about this some more, but a few ideas for discussion: bq. IMHO, ViewFsMountPoint should be abstracted and expose only the needed attributes – the MountedOn path and its target FileSystem. The FileSystem could be a hdfs:// or it could be a one for MergeFs, but I don't see a need for exposing all the NameServices, at least for now. I think the intent was to implement merging in ViewFileSystem itself, rather than a new FileSystem. So we'd need to return an array here, like in the original MountPoint. Our user API for referring to a FileSystem is also by URI, not by object reference. Yes, the user can always call {{getUri}}, but there is global state in a FileSystem like file handles and statistics, and it might be better to not share that by handing out a FileSystem object which they can poke at. Also, since it looks like we allow mounting subdirectories, {{FileSystem#getUri}} by itself is underspecified without the path component. Finally, what is the reason for using generics? getTargetFileSystem will always return a FileSystem right? bq. <ViewFsUtil> Not worth to separate out, though we should think about this some more. As a semi-side note, I'm quite surprised that ViewFileSystem is annotated @Public. My impression from DistributedFileSystem is that the FileSystem subclasses are private, and are only used when casted as a FileSystem. This is why we have HdfsAdmin, which lets you do DFS-specific operations. ViewFsUtil is similar to HdfsAdmin, but used to examine an already created ViewFileSystem instance. However, since {{getStatus}} takes a ViewFileSystem, it forces the user to downcast which is unfortunate. Instead, we could have an {{isViewFileSystem}} API, and having {{getStatus}} take a {{FileSystem}} and throwing UnsupportedOperation if the passed FS is not a VFS. Finally, we should probably also name this {{ViewFileSystemUtil}} since {{ViewFs}} is the FileContext implementation. bq. I have seen the unix df command getting stuck at times when NFS servers are not reachable. But, I am totally ok to remove this extra feature and error out when any of the backing NameServices are not reachable. Good point, I've seen similar behavior as well. Let's tackle this in a separate JIRA though, and maybe put the behavior behind a flag. I do think we should return non-zero in this case, and think about how scripts will be able to parse the output. > Implement 'hadoop fs -df' command for ViewFileSystem > ------------------------------------------------------- > > Key: HDFS-11058 > URL: https://issues.apache.org/jira/browse/HDFS-11058 > Project: Hadoop HDFS > Issue Type: Task > Affects Versions: 3.0.0-alpha1 > Reporter: Manoj Govindassamy > Assignee: Manoj Govindassamy > Labels: viewfs > Attachments: HDFS-11058.01.patch > > > Df command doesn't seem to work well with ViewFileSystem. It always reports > used data as 0. Here is the client mount table configuration I am using > against a federated clusters of 2 NameNodes and 2 DataNoes. > {code} > 1 <?xml version="1.0" ?> > 2 <configuration> > 3 <property> > 4 <name>fs.defaultFS</name> > 5 <value>viewfs://ClusterX/</value> > 6 </property> > .. > 11 <property> > 12 <name>fs.default.name</name> > 13 <value>viewfs://ClusterX/</value> > 14 </property> > .. > 23 <property> > 24 <name>fs.viewfs.mounttable.ClusterX.link./nn0</name> > 25 <value>hdfs://127.0.0.1:50001/</value> > 26 </property> > 27 <property> > 28 <name>fs.viewfs.mounttable.ClusterX.link./nn1</name> > 29 <value>hdfs://127.0.0.1:51001/</value> > 30 </property> > 31 <property> > 32 <name>fs.viewfs.mounttable.ClusterX.link./nn2</name> > 33 <value>hdfs://127.0.0.1:52001/nn2</value> > 34 </property> > 35 <property> > 36 <name>fs.viewfs.mounttable.ClusterX.link./nn3</name> > 37 <value>hdfs://127.0.0.1:52001/nn3</value> > 38 </property> > 39 <property> > 40 <name>fs.viewfs.mounttable.ClusterY.linkMergeSlash</name> > 41 <value>hdfs://127.0.0.1:50001/</value> > 42 </property> > 43 </configuration> > {code} > {{Df}} command always reports Size/Available as 8.0E and the usage as 0 for > any federated cluster. > {noformat} > # hadoop fs -fs viewfs://ClusterX/ -df / > Filesystem Size Used Available Use% > viewfs://ClusterX/ 9223372036854775807 0 9223372036854775807 0% > # hadoop fs -fs viewfs://ClusterX/ -df -h / > Filesystem Size Used Available Use% > viewfs://ClusterX/ 8.0 E 0 8.0 E 0% > # hadoop fs -fs viewfs://ClusterY/ -df -h / > Filesystem Size Used Available Use% > viewfs://ClusterY/ 8.0 E 0 8.0 E 0% > {noformat} > Whereas {{Du}} command seems to work as expected even with ViewFileSystem. > {noformat} > # hadoop fs -fs viewfs://ClusterY/ -du -h / > 10.6 K 31.8 K /build.log.16y > 0 0 /user > # hadoop fs -fs viewfs://ClusterX/ -du -h / > 10.6 K 31.8 K /nn0 > 0 0 /nn1 > 20.2 K 35.8 K /nn3 > 40.6 K 34.3 K /nn4 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org