[
https://issues.apache.org/jira/browse/MAPREDUCE-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918129#action_12918129
]
Ramkumar Vadali commented on MAPREDUCE-2110:
--------------------------------------------
@Mahadev, I agree that exposing an implementation detail is not good. But there
is actually more functionality that we would like to add to HarFileSystem, we
could use this Jira to discuss it.
Raid creates a parity file for each data file that is raided and has reduced
replication. As such this helps save disk space but doubles the number of
inodes. Hence we create HARs out of the parity files to reduce the number of
new inodes. Now the HAR part files have reduced replication as well and it is
possible that a HAR part file has missing blocks, which we need to fix.
To regenerate a HAR part file block, we need to identify what parity
files/offsets map to that part file block. This requires new code that parses
the HAR index file and maps a partfile:offset -> datafile:offset. This is the
functionality that we would actually like to add. Thoughts?
> add getArchiveIndex to HarFileSystem
> ------------------------------------
>
> Key: MAPREDUCE-2110
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2110
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Patrick Kling
> Priority: Minor
> Attachments: MAPREDUCE-2110.patch
>
>
> This patch adds a public getter for archiveIndex to HarFileSystem, allowing
> us to access the index file corresponding to a har file system (useful for
> raid).
> Index: src/tools/org/apache/hadoop/fs/HarFileSystem.java
> ===================================================================
> --- src/tools/org/apache/hadoop/fs/HarFileSystem.java (revision 1004421)
> +++ src/tools/org/apache/hadoop/fs/HarFileSystem.java (working copy)
> @@ -759,6 +759,13 @@
> }
>
> /**
> + * returns the archive index
> + */
> + public Path getArchiveIndex() {
> + return archiveIndex;
> + }
> +
> + /**
> * return the top level archive path.
> */
> public Path getHomeDirectory() {
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.