final int max_loc = conf.getInt(MAX_SPLIT_LOCATIONS, 10);
if (locations.length > max_loc) {
LOG.warn("Max block location exceeded for split: "
+ split + " splitsize: " + locations.length +
" maxsize: " + max_loc);
locations = Arrays.copyOf(locations, max_loc);
}
I was wondering about the above code in JobSplitWriter in hadoop 1.0.3.
The below commit comment is somewhat vague. I saw MAPREDUCE-1943 about
setting limits to save memory on the jobtracker. I wanted to confirm
that the above fix just serves as a warning and saves memory on the
jobtracker and does not cap the input at all since most inputformats
seem to ignore the locations?
I also wanted to know why the recent MAPREDUCE-4146 added this cap to
2.0.1-alpha but with the original capping behavior of causing the job to
fail by throwing an IOException instead of just warning the user as the
current code does?
commit 51be5c3d61cbc7960174493428fbaa41d5fbe84d
Author: Chris Douglas <cdoug...@apache.org>
Date: Fri Oct 1 01:49:51 2010 -0700
Change client-side enforcement of limit on locations per split
to be advisory. Truncate on client, optionally fail job at
JobTracker if
exceeded. Added mapreduce.job.max.split.locations property.
+++ b/YAHOO-CHANGES.txt
+ Change client-side enforcement of limit on locations per split
+ to be advisory. Truncate on client, optionally fail job at
JobTracker if
+ exceeded. Added mapreduce.job.max.split.locations property.
(cdouglas)
+