[
https://issues.apache.org/jira/browse/HADOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601632#action_12601632
]
Devaraj Das commented on HADOOP-3307:
-------------------------------------
1) The query part in the creation of the URI can be removed (in fact we
probably should flag an error if the har path contains a '?' since it is not a
valid Path)
2) decodeURI should be done first and then the har archive path can be extracted
3) getHarAuth needn't be parsing the uri everytime since it is constant. The
auth can just be stored in a class variable.
4) open() & other filesystem calls should support taking just the fragment path
to a file within the archive
5) why is fileStatusInIndex storing the Store object in a list while going
through the master index? Isn't the list going to be always of size 1 (if the
file is present in the archive)
6) The index files are not closed in the fileStatusInIndex call. This might
lead to problems in the cases where the underlying filesystem is the localfs
(where open actually returns a filedescriptor). But I am also not sure whether
we should open and close on every call to fileStatusInIndex. Can we somehow
cache the handles to the index files and reuse them.
7) When we create a part file, can we record the things like replication
factor, permissions, etc. and emit them just like we emit the other info like
partfilename, etc. during archive creation and store them in the index file.
That way we don't have to fake everything in the listStatus.
8) In listStatus, the start and end braces are missing for the if/else block
9) In listStatus, the check hstatus.isDir()?0:hstatus.getLength() seems
redundant. hstatus.isDir is always going to be false
10) I don't understand clearly why makeRelative is done in the listStatus and
getFileStatus calls
11) Do you enforce the .har in the archive name when it is created?
I am not done reviewing the entire patch yet ..
> Archives in Hadoop.
> -------------------
>
> Key: HADOOP-3307
> URL: https://issues.apache.org/jira/browse/HADOOP-3307
> Project: Hadoop Core
> Issue Type: New Feature
> Components: fs
> Reporter: Mahadev konar
> Assignee: Mahadev konar
> Fix For: 0.18.0
>
> Attachments: hadoop-3307_1.patch, hadoop-3307_2.patch
>
>
> This is a new feature for archiving and unarchiving files in HDFS.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.