[jira] Commented: (HADOOP-4952) Improved files system interface for the application writer.

Sanjay Radia (JIRA) Wed, 09 Sep 2009 23:27:24 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753453#action_12753453
 ]


Sanjay Radia commented on HADOOP-4952:
--------------------------------------

@todd

>rather than FileSystem.getInitialWorkingDirectory() returning null by default 
>and having the check in FileContext, have it default to just passing through 
>to getHomeDirectory() in FileSystem.java

I went back and forth on this one.  Decided that the filesytem layer needs to 
be dumb and return information to upper layer (FileContext) and let it set 
defaults as needed

>could makeAbsolute be made public? It seems generally useful
 Most apps will want to use the public makeQualified(). Absolute paths cannot 
be exchanged across FileContexts an hence can lead to a potential confusion on 
the closure (ie context) for the pathname resolution.

>are these kinds of paths [relative paths with schemes] ever legal in Hadoop? 
>If not, can this check go into the Path constructor such that we can never end 
>up with an invalid object?
Such paths do not make sense for FileContext, were redundant for the existing 
FileSystem and will not be allowed for new AbstractFileSystem (HADOOP-6223). 
Shells do allow it but probably should not. So till we replace FileSystem with 
AbstractFileSystem we cannot change the spec for Path. As far as the shell goes 
we should probably issue a warning when such names are used; btw this may break 
some of our shecll scripts.

> in favor of removing this constructor entirely and forcing the user to 
> explicitly choose to construct a new Configuration().
I disagree - most users should not need to know about config;  the default 
config from the environment should be good enough.
Further, as we migrate towards little or no client-side config (this jira has 
moved most config vars to SS) the role of the config becomes less important.  
Besides MR  and tests programs,  I don't see many use cases for an app using  
anything other than the default config.

> Why allow the user to pass either URI or FileSystem instances? There's less 
> code if you just provide one, and the user can always go from one to the 
> other. I'm in favor of fewer code paths where possible.
The new AbstractFileSystem that replaces FileSystem will have a protected 
constructor. So one cannot create a FileSystem using its URI.  Only tests will 
need the static factory method with a FileSystem as a parameter. Most apps will 
use the factory methods that use default config or one where a URI of the 
default FileSystem is passed.

> A lot of the stuff in util is copied straight from FileSystem.java. This code 
> duplication should be avoided. 
FileSystem will be deprecated, removed and replaced by AbstractFileSystem (see 
HADOOP-6223).
Hence these utility methods will  exist *only* in the FileContext.




> Improved files system interface for the application writer.
> -----------------------------------------------------------
>
>                 Key: HADOOP-4952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4952
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: FileContext-common10.patch, FileContext-common11.patch, 
> FileContext-hdfs10.patch, FileContext-hdfs11.patch, FileContext3.patch, 
> FileContext5.patch, FileContext6.patch, FileContext7.patch, 
> FileContext9.patch, Files.java, Files.java, FilesContext1.patch, 
> FilesContext2.patch
>
>
> Currently the FIleSystem interface serves two purposes:
> - an application writer's interface for using the Hadoop file system
> - a file system implementer's interface (e.g. hdfs, local file system, kfs, 
> etc)
> This Jira proposes that we provide a simpler interfaces for the application 
> writer and leave the FilsSystem  interface for the implementer of a 
> filesystem.
> - Filesystem interface  has a  confusing set of methods for the application 
> writer
> - We could make it easier to take advantage of the URI file naming
> ** Current approach is to get FileSystem instance by supplying the URI and 
> then access that name space. It is consistent for the FileSystem instance to 
> not accept URIs for other schemes, but we can do better.
> ** The special copyFromLocalFIle can be generalized as a  copyFile where the 
> src or target can be generalized to any URI, including the local one.
> ** The proposed scheme (below) simplifies this.
> -     The client side config can be simplified. 
> ** New config() by default uses the default config. Since this is the common 
> usage pattern, one should not need to always pass the config as a parameter 
> when accessing the file system.  
> -     
> ** It does not handle multiple file systems too well. Today a site.xml is 
> derived from a single Hadoop cluster. This does not make sense for multiple 
> Hadoop clusters which may have different defaults.
> ** Further one should need very little to configure the client side:
> *** Default files system.
> *** Block size 
> *** Replication factor
> *** Scheme to class mapping
> ** It should be possible to take Blocksize and replication factors defaults 
> from the target file system, rather then the client size config.  I am not 
> suggesting we don't allow setting client side defaults, but most clients do 
> not care and would find it simpler to take the defaults for their systems  
> from the target file system. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4952) Improved files system interface for the application writer.

Reply via email to