[ https://issues.apache.org/jira/browse/HADOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593070#action_12593070 ]
Doug Cutting commented on HADOOP-3307: -------------------------------------- > the intent is to change path to make it work.... Would you special case the handling of "har:" uri's in Path? Or would you always parse queries as part of the hierarchical path? Both of these sound like bad ideas to me. We should not add special functionality to FileSystem or Path for "har:" uris. We have a proposal that layers cleanly on top of the existing FileSystem and Path implementations. Alternately, we might consider generic extensions to FileSystem and/or Path, like symbolic links or mount points, to see whether these might facilitate a more transparent archive implementation. But we should not add special-purpose hacks for a particular archive format to these generic classes. Mounts of various sorts would be fairly easy to add, but perhaps not that easy to use. I proposed a simple version above that requires no changes to existing code. A mount capability that permitted one to attach a FileSystem implementation at an arbitrary point in the URI space would not be overly hard to add. The primary downside of mount-based approaches is that they require state. One would have to add something to the configuration or job for each mount point, or require all FileSystem implementations to know how to store a mount, or add a mount file type, or somesuch. Note that this is not a problem with Unix mount, since there's only one system involved, but in a distributed system like Hadoop we need to either transmit the mount points with code (e.g., in the job) or somehow store them in the filesystem. The current proposal, embedding the URI of the archive within a "har:" uri, will both solve the problems at hand and require no architectural changes to the filesystem. The only downside is that archive file naming is a little obtuse. Long-term, the addition of symbolic links to FileSystem might address that, no? > Archives in Hadoop. > ------------------- > > Key: HADOOP-3307 > URL: https://issues.apache.org/jira/browse/HADOOP-3307 > Project: Hadoop Core > Issue Type: New Feature > Components: fs > Reporter: Mahadev konar > Assignee: Mahadev konar > Fix For: 0.18.0 > > > This is a new feature for archiving and unarchiving files in HDFS. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.