mapreduce.job.max.split.locations just a warning in hadoop 1.0.3 but not in 2.0.1-alpha?

Jim Donofrio Tue, 05 Jun 2012 17:55:41 -0700

        final int max_loc = conf.getInt(MAX_SPLIT_LOCATIONS, 10);
        if (locations.length > max_loc) {
          LOG.warn("Max block location exceeded for split: "
              + split + " splitsize: " + locations.length +
              " maxsize: " + max_loc);
          locations = Arrays.copyOf(locations, max_loc);
        }

I was wondering about the above code in JobSplitWriter in hadoop 1.0.3.The below commit comment is somewhat vague. I saw MAPREDUCE-1943 aboutsetting limits to save memory on the jobtracker. I wanted to confirmthat the above fix just serves as a warning and saves memory on thejobtracker and does not cap the input at all since most inputformatsseem to ignore the locations?

I also wanted to know why the recent MAPREDUCE-4146 added this cap to2.0.1-alpha but with the original capping behavior of causing the job tofail by throwing an IOException instead of just warning the user as thecurrent code does?



commit 51be5c3d61cbc7960174493428fbaa41d5fbe84d
Author: Chris Douglas <cdoug...@apache.org>
Date:   Fri Oct 1 01:49:51 2010 -0700

     Change client-side enforcement of limit on locations per split

to be advisory. Truncate on client, optionally fail job atJobTracker if

    exceeded. Added mapreduce.job.max.split.locations property.

    +++ b/YAHOO-CHANGES.txt
    +     Change client-side enforcement of limit on locations per split

+ to be advisory. Truncate on client, optionally fail job atJobTracker if+ exceeded. Added mapreduce.job.max.split.locations property.(cdouglas)

mapreduce.job.max.split.locations just a warning in hadoop 1.0.3 but not in 2.0.1-alpha?

Reply via email to