[
https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027022#comment-14027022
]
Karthik Kambatla commented on MAPREDUCE-5896:
---------------------------------------------
Review comments:
# Can we make InputSplitLocationInfo extend InputSplit? It doesn't make sense
for any class to implement only InputSplitLocationInfo without implementing
InputSplit.
# I am uncomfortable having to depend on index matching between
InputSplit#getLocations and InputSplit#getLocationInfo. Do you think it would
make sense to include the string corresponding to the location in
SplitLocationInfo itself? We could deprecate InputSplit#getLocations(). Users
are to be expected to use getLocationInfos instead.
# Nothing to do with this patch. It is unfortunate that mapreduce.InputSplit
doesn't implement mapred.InputSplit. Would it be easy to fix it?
# Nit: The following two constants should probably be in SplitLocationInfo?
{code}
private static final SplitLocationInfo ON_DISK = new SplitLocationInfo(false);
private static final SplitLocationInfo IN_MEMORY = new
SplitLocationInfo(true);
{code}
# Nit: Instead of assigning ON_DISK by default, would it make sense to set it
post null-check after the loop for checking if it is in memory.
{code}
for (int i = 0; i < hosts.length; i++) {
hostInfos[i] = ON_DISK;
// because N will be tiny, scanning is probably faster than a HashSet
for (String inMemoryHost : inMemoryHosts) {
{code}
> Allow InputSplits to indicate which locations have the block cached in memory
> -----------------------------------------------------------------------------
>
> Key: MAPREDUCE-5896
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: 2.4.0
> Reporter: Sandy Ryza
> Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5896-1.patch, MAPREDUCE-5896.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)