Implement getFileBlockLocations in HarFilesystem
------------------------------------------------
Key: MAPREDUCE-1752
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1752
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Dmytro Molkov
To efficiently run map reduce on the data that has been HAR'ed it will be great
to actually implement getFileBlockLocations for a given filename.
This way the JobTracker will have information about data locality and will
schedule tasks appropriately.
I believe the overhead introduced by doing lookups in the index files can be
smaller than that of copying data over the wire.
Will upload the patch shortly, but would love to get some feedback on this. And
any ideas on how to test it are very welcome.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.