[
https://issues.apache.org/jira/browse/MAPREDUCE-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Patrick Kling updated MAPREDUCE-1752:
-------------------------------------
Attachment: MAPREDUCE-1752.3.patch
I have updated Dmytro's patch based on the second solution discussed above. In
addition, this code also fixes the length of the last block location
corresponding to the requested range. For the example above, it returns the
following (as verified by one of the new test cases):
b0 = <offset=0, length=128>
b1 = <offset=128, length=384>
> Implement getFileBlockLocations in HarFilesystem
> ------------------------------------------------
>
> Key: MAPREDUCE-1752
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1752
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: harchive
> Reporter: Dmytro Molkov
> Assignee: Dmytro Molkov
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1752.2.patch, MAPREDUCE-1752.3.patch,
> MR-1752.patch
>
>
> To efficiently run map reduce on the data that has been HAR'ed it will be
> great to actually implement getFileBlockLocations for a given filename.
> This way the JobTracker will have information about data locality and will
> schedule tasks appropriately.
> I believe the overhead introduced by doing lookups in the index files can be
> smaller than that of copying data over the wire.
> Will upload the patch shortly, but would love to get some feedback on this.
> And any ideas on how to test it are very welcome.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.