[ 
https://issues.apache.org/jira/browse/HADOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593070#action_12593070
 ] 

Doug Cutting commented on HADOOP-3307:
--------------------------------------

> the intent is to change path to make it work.... 

Would you special case the handling of "har:" uri's in Path?  Or would you 
always parse queries as part of the hierarchical path?  Both of these sound 
like bad ideas to me.

We should not add special functionality to FileSystem or Path for "har:" uris.  
We have a proposal that layers cleanly on top of the existing FileSystem and 
Path implementations.  Alternately, we might consider generic extensions to 
FileSystem and/or Path, like symbolic links or mount points, to see whether 
these might facilitate a more transparent archive implementation.  But we 
should not add special-purpose hacks for a particular archive format to these 
generic classes.

Mounts of various sorts would be fairly easy to add, but perhaps not that easy 
to use.  I proposed a simple version above that requires no changes to existing 
code.  A mount capability that permitted one to attach a FileSystem 
implementation at an arbitrary point in the URI space would not be overly hard 
to add.

The primary downside of mount-based approaches is that they require state.  One 
would have to add something to the configuration or job for each mount point, 
or require all FileSystem implementations to know how to store a mount, or add 
a mount file type, or somesuch.  Note that this is not a problem with Unix 
mount, since there's only one system involved, but in a distributed system like 
Hadoop we need to either transmit the mount points with code (e.g., in the job) 
or somehow store them in the filesystem.

The current proposal, embedding the URI of the archive within a "har:" uri, 
will both solve the problems at hand and require no architectural changes to 
the filesystem.  The only downside is that archive file naming is a little 
obtuse.  Long-term, the addition of symbolic links to FileSystem might address 
that, no?


> Archives in Hadoop.
> -------------------
>
>                 Key: HADOOP-3307
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3307
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Mahadev konar
>            Assignee: Mahadev konar
>             Fix For: 0.18.0
>
>
> This is a new feature for archiving and unarchiving files in HDFS. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to