[jira] Commented: (HADOOP-3307) Archives in Hadoop.

Devaraj Das (JIRA) Mon, 02 Jun 2008 07:26:07 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601632#action_12601632
 ]


Devaraj Das commented on HADOOP-3307:
-------------------------------------

1) The query part in the creation of the URI can be removed (in fact we 
probably should flag an error if the har path contains a '?' since it is not a 
valid Path)
2) decodeURI should be done first and then the har archive path can be extracted
3) getHarAuth needn't be parsing the uri everytime since it is constant. The 
auth can just be stored in a class variable.
4) open() & other filesystem calls should support taking just the fragment path 
to a file within the archive
5) why is fileStatusInIndex storing the Store object in a list while going 
through the master index? Isn't the list going to be always of size 1 (if the 
file is present in the archive)
6) The index files are not closed in the fileStatusInIndex call. This might 
lead to problems in the cases where the underlying filesystem is the localfs 
(where open actually returns a filedescriptor). But I am also not sure whether 
we should open and close on every call to fileStatusInIndex. Can we somehow 
cache the handles to the index files and reuse them.
7) When we create a part file, can we record the things like replication 
factor, permissions, etc. and emit them just like we emit the other info like 
partfilename, etc. during archive creation and store them in the index file. 
That way we don't have to fake everything in the listStatus.
8) In listStatus, the start and end braces are missing for the if/else block
9) In listStatus, the check hstatus.isDir()?0:hstatus.getLength() seems 
redundant. hstatus.isDir is always going to be false
10) I don't understand clearly why makeRelative is done in the listStatus and 
getFileStatus calls
11) Do you enforce the .har in the archive name when it is created?

I am not done reviewing the entire patch yet ..

> Archives in Hadoop.
> -------------------
>
>                 Key: HADOOP-3307
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3307
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Mahadev konar
>            Assignee: Mahadev konar
>             Fix For: 0.18.0
>
>         Attachments: hadoop-3307_1.patch, hadoop-3307_2.patch
>
>
> This is a new feature for archiving and unarchiving files in HDFS. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3307) Archives in Hadoop.

Reply via email to