[jira] Commented: (HADOOP-4952) Improved files system interface for the application writer.

Sanjay Radia (JIRA) Thu, 17 Sep 2009 10:54:25 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756657#action_12756657
 ]


Sanjay Radia commented on HADOOP-4952:
--------------------------------------

*On absolute paths etc:*
>would applications need to access FileContext.makeAbsolute()? Can this method 
>be made public?

Okay here is how I have been using the terms. It seems to agree with what is in 
Path().
FullyQualified Path - has the scheme and authority and the path component is 
absolute
AbsolutePath : /foo
RelativePath:  foo  (ie relative to wd)

Applications often want to convert a path to fully qualified if they want to 
pass it around or store it in a file.
In a URI based file name space it very dangerous to get an absolute path (ie 
without the scheme and authority part) 
and pass it around or store in in a file. If someone
uses a different context then there can be closure confusion. (same is true in 
Unix - if you store wd relative names in a file or pass them around there can 
be closure confusion).

So we don't need to expose FileContext#makeAbsolute ()  right now- we can add 
it later if there is a use case.

(BTW my FileContext#makeAbsolute() was bad method name - its comment and impl 
states that it merely fixes the relative part.
I have changed the method name to reflect that.)

> Path.isAbsolute() and Path.isPathComponentAlbolute() look strikingly same to 
> me. do we need both?
Turns out the old Path#isAbsolute()  was really doing 
Path#isPathComponentAbsolute().
But I had to leave the Path#isAbsolute()'s impl unchanged since I did not know 
if ALL the callers were intending the semantics of  isAbsolute() or 
isPathComponentAbsolute().

I had intended to file a Jira to explore this and fix it if necessary. If it 
turns out that all we need is isPathComponentAbsolute() 
then we should deprecate isAbsolute(); besides its impl is incorrect.
But if there are use cases for isAbsolute() then we should fix its impl and 
manage the change in spec.
Sorry my mistake to not have filed the Jira ahead of time for clarity.

> Improved files system interface for the application writer.
> -----------------------------------------------------------
>
>                 Key: HADOOP-4952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4952
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.21.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>         Attachments: FileContext-common10.patch, FileContext-common11.patch, 
> FileContext-common12.patch, FileContext-common13.patch, 
> FileContext-common14.patch, FileContext-common16.patch, 
> FileContext-common18.patch, FileContext-common19.patch, 
> FileContext-common21.patch, FileContext-common22.patch, 
> FileContext-hdfs10.patch, FileContext-hdfs11.patch, FileContext3.patch, 
> FileContext5.patch, FileContext6.patch, FileContext7.patch, 
> FileContext9.patch, Files.java, Files.java, FilesContext1.patch, 
> FilesContext2.patch
>
>
> Currently the FIleSystem interface serves two purposes:
> - an application writer's interface for using the Hadoop file system
> - a file system implementer's interface (e.g. hdfs, local file system, kfs, 
> etc)
> This Jira proposes that we provide a simpler interfaces for the application 
> writer and leave the FilsSystem  interface for the implementer of a 
> filesystem.
> - Filesystem interface  has a  confusing set of methods for the application 
> writer
> - We could make it easier to take advantage of the URI file naming
> ** Current approach is to get FileSystem instance by supplying the URI and 
> then access that name space. It is consistent for the FileSystem instance to 
> not accept URIs for other schemes, but we can do better.
> ** The special copyFromLocalFIle can be generalized as a  copyFile where the 
> src or target can be generalized to any URI, including the local one.
> ** The proposed scheme (below) simplifies this.
> -     The client side config can be simplified. 
> ** New config() by default uses the default config. Since this is the common 
> usage pattern, one should not need to always pass the config as a parameter 
> when accessing the file system.  
> -     
> ** It does not handle multiple file systems too well. Today a site.xml is 
> derived from a single Hadoop cluster. This does not make sense for multiple 
> Hadoop clusters which may have different defaults.
> ** Further one should need very little to configure the client side:
> *** Default files system.
> *** Block size 
> *** Replication factor
> *** Scheme to class mapping
> ** It should be possible to take Blocksize and replication factors defaults 
> from the target file system, rather then the client size config.  I am not 
> suggesting we don't allow setting client side defaults, but most clients do 
> not care and would find it simpler to take the defaults for their systems  
> from the target file system. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4952) Improved files system interface for the application writer.

Reply via email to