Hi guys,
I suspected that the problem was due to overhead introduced by the
filesystem, so I tried to set the "dfs.replication.max" property to
different values.
First, I tried with 2, and I got a message saying that I was requesting a
value of 3, which was bigger than the limit. So I couldn't do the run(it
seems this 3 is hardcoded somewhere, I read that in Jira).
Then I tried with 3, I could generate the input files for the map reduce
app, but when trying to run I got this one,
Exception in thread "main" java.io.IOException: file
/tmp/hadoop-aandre/mapred/staging/aandre/.staging/job_201106230004_0003/job.jar.
Requested replication 10 exceeds maximum 3
at
org.apache.hadoop.hdfs.server.namenode.BlockManager.verifyReplication(BlockManager.java:468)
which seems like the framework were trying to replicate the output in as
many nodes as possible. Could this be the degradation source?.
Also I attached the log for the run with 7 nodes,.
Alberto.
On 21 June 2011 14:40, Harsh J <[email protected]> wrote:
> Matt,
>
> You're right that it (slowstart) does not / would not affect much. I
> was merely explaining the reason behind his observance of reducers
> getting scheduled early, not really recommending a tweak for
> performance changes there.
>
> On Tue, Jun 21, 2011 at 10:46 PM, GOEKE, MATTHEW (AG/1000)
> <[email protected]> wrote:
> > Harsh,
> >
> > Is it possible for mapred.reduce.slowstart.completed.maps to even play a
> significant role in this? The only benefit he would find in tweaking that
> for his problem would be to spread network traffic from the shuffle over a
> longer period of time at a cost of having the reducer using resources
> earlier. Either way he would see this effect across both sets of runs if he
> is using the default parameters. I guess it would all depend on what kind of
> network layout the cluster is on.
> >
> > Matt
> >
> > -----Original Message-----
> > From: Harsh J [mailto:[email protected]]
> > Sent: Tuesday, June 21, 2011 12:09 PM
> > To: [email protected]
> > Subject: Re: Poor scalability with map reduce application
> >
> > Alberto,
> >
> > On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti
> > <[email protected]> wrote:
> >> I don't know if speculatives maps are on, I'll check it. One thing I
> >> observed is that reduces begin before all maps have finished. Let me
> check
> >> also if the difference is on the map side or in the reduce. I believe
> it's
> >> balanced, both are slower when adding more nodes, but i'll confirm that.
> >
> > Maps and reduces are speculative by default, so must've been ON. Could
> > you also post a general input vs. output record counts and statistics
> > like that between your job runs, to correlate?
> >
> > The reducers get scheduled early but do not exactly "reduce()" until
> > all maps are done. They just keep fetching outputs. Their scheduling
> > can be controlled with some configurations (say, to start only after
> > X% of maps are done -- by default it starts up when 5% of maps are
> > done).
> >
> > --
> > Harsh J
> > This e-mail message may contain privileged and/or confidential
> information, and is intended to be received only by persons entitled
> > to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and
> > all attachments from any servers, hard drives or any other media. Other
> use of this e-mail by you is strictly prohibited.
> >
> > All e-mails and attachments sent and received are subject to monitoring,
> reading and archival by Monsanto, including its
> > subsidiaries. The recipient of this e-mail is solely responsible for
> checking for the presence of "Viruses" or other "Malware".
> > Monsanto, along with its subsidiaries, accepts no liability for any
> damage caused by any such code transmitted by or accompanying
> > this e-mail or any attachment.
> >
> >
> > The information contained in this email may be subject to the export
> control laws and regulations of the United States, potentially
> > including but not limited to the Export Administration Regulations (EAR)
> and sanctions regulations issued by the U.S. Department of
> > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of
> this information you are obligated to comply with all
> > applicable U.S. export laws and regulations.
> >
> >
>
>
>
> --
> Harsh J
>
--
José Pablo Alberto Andreotti.
Tel: 54 351 4730292
Móvil: 54351156526363.
MSN: [email protected]
Skype: andreottialberto
11/06/23 01:09:38 INFO security.Groups: Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
11/06/23 01:09:38 WARN conf.Configuration: mapred.task.id is deprecated.
Instead, use mapreduce.task.attempt.id
11/06/23 01:09:38 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/06/23 01:09:38 INFO input.FileInputFormat: Total input paths to process : 1
11/06/23 01:09:40 WARN conf.Configuration: mapred.map.tasks is deprecated.
Instead, use mapreduce.job.maps
11/06/23 01:09:40 INFO mapreduce.JobSubmitter: number of splits:18
11/06/23 01:09:40 INFO mapreduce.JobSubmitter: adding the following namenodes'
delegation tokens:null
11/06/23 01:09:40 INFO mapreduce.Job: Running job: job_201106230057_0001
11/06/23 01:09:41 INFO mapreduce.Job: map 0% reduce 0%
11/06/23 01:09:58 INFO mapreduce.Job: map 3% reduce 0%
11/06/23 01:10:00 INFO mapreduce.Job: map 5% reduce 0%
11/06/23 01:10:01 INFO mapreduce.Job: map 9% reduce 0%
11/06/23 01:10:03 INFO mapreduce.Job: map 10% reduce 0%
11/06/23 01:10:04 INFO mapreduce.Job: map 13% reduce 0%
11/06/23 01:10:07 INFO mapreduce.Job: map 15% reduce 0%
11/06/23 01:10:10 INFO mapreduce.Job: map 16% reduce 0%
11/06/23 01:10:15 INFO mapreduce.Job: map 17% reduce 0%
11/06/23 01:10:18 INFO mapreduce.Job: map 18% reduce 0%
11/06/23 01:10:19 INFO mapreduce.Job: map 19% reduce 0%
11/06/23 01:10:21 INFO mapreduce.Job: map 20% reduce 0%
11/06/23 01:10:22 INFO mapreduce.Job: map 22% reduce 0%
11/06/23 01:10:25 INFO mapreduce.Job: map 25% reduce 0%
11/06/23 01:10:28 INFO mapreduce.Job: map 27% reduce 0%
11/06/23 01:10:31 INFO mapreduce.Job: map 28% reduce 0%
11/06/23 01:10:33 INFO mapreduce.Job: map 29% reduce 0%
11/06/23 01:10:34 INFO mapreduce.Job: map 30% reduce 0%
11/06/23 01:10:36 INFO mapreduce.Job: map 31% reduce 0%
11/06/23 01:10:37 INFO mapreduce.Job: map 32% reduce 0%
11/06/23 01:10:39 INFO mapreduce.Job: map 33% reduce 0%
11/06/23 01:10:40 INFO mapreduce.Job: map 34% reduce 0%
11/06/23 01:10:43 INFO mapreduce.Job: map 35% reduce 0%
11/06/23 01:10:46 INFO mapreduce.Job: map 37% reduce 0%
11/06/23 01:10:50 INFO mapreduce.Job: map 38% reduce 0%
11/06/23 01:10:53 INFO mapreduce.Job: map 40% reduce 0%
11/06/23 01:10:55 INFO mapreduce.Job: map 41% reduce 0%
11/06/23 01:10:56 INFO mapreduce.Job: map 43% reduce 0%
11/06/23 01:10:58 INFO mapreduce.Job: map 45% reduce 0%
11/06/23 01:10:59 INFO mapreduce.Job: map 47% reduce 0%
11/06/23 01:11:02 INFO mapreduce.Job: map 49% reduce 0%
11/06/23 01:11:05 INFO mapreduce.Job: map 50% reduce 0%
11/06/23 01:11:08 INFO mapreduce.Job: map 51% reduce 3%
11/06/23 01:11:11 INFO mapreduce.Job: map 53% reduce 5%
11/06/23 01:11:13 INFO mapreduce.Job: map 54% reduce 5%
11/06/23 01:11:14 INFO mapreduce.Job: map 57% reduce 5%
11/06/23 01:11:16 INFO mapreduce.Job: map 58% reduce 5%
11/06/23 01:11:17 INFO mapreduce.Job: map 60% reduce 7%
11/06/23 01:11:19 INFO mapreduce.Job: map 61% reduce 7%
11/06/23 01:11:20 INFO mapreduce.Job: map 63% reduce 7%
11/06/23 01:11:23 INFO mapreduce.Job: map 65% reduce 7%
11/06/23 01:11:24 INFO mapreduce.Job: map 65% reduce 9%
11/06/23 01:11:26 INFO mapreduce.Job: map 67% reduce 9%
11/06/23 01:11:27 INFO mapreduce.Job: map 67% reduce 12%
11/06/23 01:11:28 INFO mapreduce.Job: map 68% reduce 12%
11/06/23 01:11:29 INFO mapreduce.Job: map 69% reduce 12%
11/06/23 01:11:30 INFO mapreduce.Job: map 69% reduce 14%
11/06/23 01:11:31 INFO mapreduce.Job: map 70% reduce 14%
11/06/23 01:11:32 INFO mapreduce.Job: map 71% reduce 14%
11/06/23 01:11:33 INFO mapreduce.Job: map 72% reduce 14%
11/06/23 01:11:34 INFO mapreduce.Job: map 73% reduce 14%
11/06/23 01:11:35 INFO mapreduce.Job: map 76% reduce 14%
11/06/23 01:11:36 INFO mapreduce.Job: map 76% reduce 16%
11/06/23 01:11:38 INFO mapreduce.Job: map 78% reduce 16%
11/06/23 01:11:41 INFO mapreduce.Job: map 79% reduce 16%
11/06/23 01:11:44 INFO mapreduce.Job: map 80% reduce 18%
11/06/23 01:11:47 INFO mapreduce.Job: map 81% reduce 20%
11/06/23 01:11:49 INFO mapreduce.Job: map 82% reduce 20%
11/06/23 01:11:50 INFO mapreduce.Job: map 83% reduce 20%
11/06/23 01:11:52 INFO mapreduce.Job: map 84% reduce 20%
11/06/23 01:11:56 INFO mapreduce.Job: map 85% reduce 20%
11/06/23 01:11:59 INFO mapreduce.Job: map 86% reduce 22%
11/06/23 01:12:02 INFO mapreduce.Job: map 87% reduce 22%
11/06/23 01:12:04 INFO mapreduce.Job: map 88% reduce 22%
11/06/23 01:12:05 INFO mapreduce.Job: map 89% reduce 22%
11/06/23 01:12:07 INFO mapreduce.Job: map 91% reduce 22%
11/06/23 01:12:11 INFO mapreduce.Job: map 92% reduce 22%
11/06/23 01:12:14 INFO mapreduce.Job: map 93% reduce 24%
11/06/23 01:12:16 INFO mapreduce.Job: map 94% reduce 24%
11/06/23 01:12:17 INFO mapreduce.Job: map 95% reduce 25%
11/06/23 01:12:20 INFO mapreduce.Job: map 96% reduce 27%
11/06/23 01:12:23 INFO mapreduce.Job: map 97% reduce 27%
11/06/23 01:12:26 INFO mapreduce.Job: map 97% reduce 29%
11/06/23 01:12:29 INFO mapreduce.Job: map 98% reduce 29%
11/06/23 01:12:32 INFO mapreduce.Job: map 98% reduce 31%
11/06/23 01:12:49 INFO mapreduce.Job: map 99% reduce 31%
11/06/23 01:12:55 INFO mapreduce.Job: map 100% reduce 31%
11/06/23 01:12:59 INFO mapreduce.Job: map 100% reduce 33%
11/06/23 01:13:02 INFO mapreduce.Job: map 100% reduce 34%
11/06/23 01:13:05 INFO mapreduce.Job: map 100% reduce 35%
11/06/23 01:13:08 INFO mapreduce.Job: map 100% reduce 36%
11/06/23 01:13:14 INFO mapreduce.Job: map 100% reduce 37%
11/06/23 01:13:20 INFO mapreduce.Job: map 100% reduce 38%
11/06/23 01:13:23 INFO mapreduce.Job: map 100% reduce 39%
11/06/23 01:13:29 INFO mapreduce.Job: map 100% reduce 40%
11/06/23 01:13:35 INFO mapreduce.Job: map 100% reduce 41%
11/06/23 01:13:41 INFO mapreduce.Job: map 100% reduce 42%
11/06/23 01:13:44 INFO mapreduce.Job: map 100% reduce 43%
11/06/23 01:13:50 INFO mapreduce.Job: map 100% reduce 44%
11/06/23 01:13:53 INFO mapreduce.Job: map 100% reduce 45%
11/06/23 01:13:59 INFO mapreduce.Job: map 100% reduce 46%
11/06/23 01:14:02 INFO mapreduce.Job: map 100% reduce 47%
11/06/23 01:14:05 INFO mapreduce.Job: map 100% reduce 48%
11/06/23 01:14:08 INFO mapreduce.Job: map 100% reduce 49%
11/06/23 01:14:11 INFO mapreduce.Job: map 100% reduce 51%
11/06/23 01:14:14 INFO mapreduce.Job: map 100% reduce 52%
11/06/23 01:14:20 INFO mapreduce.Job: map 100% reduce 53%
11/06/23 01:14:26 INFO mapreduce.Job: map 100% reduce 54%
11/06/23 01:14:35 INFO mapreduce.Job: map 100% reduce 55%
11/06/23 01:14:38 INFO mapreduce.Job: map 100% reduce 56%
11/06/23 01:14:41 INFO mapreduce.Job: map 100% reduce 57%
11/06/23 01:14:47 INFO mapreduce.Job: map 100% reduce 58%
11/06/23 01:14:50 INFO mapreduce.Job: map 100% reduce 59%
11/06/23 01:14:53 INFO mapreduce.Job: map 100% reduce 60%
11/06/23 01:14:56 INFO mapreduce.Job: map 100% reduce 61%
11/06/23 01:14:59 INFO mapreduce.Job: map 100% reduce 62%
11/06/23 01:15:05 INFO mapreduce.Job: map 100% reduce 63%
11/06/23 01:15:08 INFO mapreduce.Job: map 100% reduce 64%
11/06/23 01:15:14 INFO mapreduce.Job: map 100% reduce 65%
11/06/23 01:15:17 INFO mapreduce.Job: map 100% reduce 66%
11/06/23 01:15:23 INFO mapreduce.Job: map 100% reduce 67%
11/06/23 01:15:29 INFO mapreduce.Job: map 100% reduce 68%
11/06/23 01:15:35 INFO mapreduce.Job: map 100% reduce 69%
11/06/23 01:15:50 INFO mapreduce.Job: map 100% reduce 70%
11/06/23 01:16:02 INFO mapreduce.Job: map 100% reduce 71%
11/06/23 01:16:12 INFO mapreduce.Job: map 100% reduce 72%
11/06/23 01:16:18 INFO mapreduce.Job: map 100% reduce 73%
11/06/23 01:16:27 INFO mapreduce.Job: map 100% reduce 74%
11/06/23 01:16:39 INFO mapreduce.Job: map 100% reduce 75%
11/06/23 01:16:48 INFO mapreduce.Job: map 100% reduce 76%
11/06/23 01:16:57 INFO mapreduce.Job: map 100% reduce 77%
11/06/23 01:17:06 INFO mapreduce.Job: map 100% reduce 78%
11/06/23 01:17:12 INFO mapreduce.Job: map 100% reduce 79%
11/06/23 01:17:24 INFO mapreduce.Job: map 100% reduce 80%
11/06/23 01:17:36 INFO mapreduce.Job: map 100% reduce 81%
11/06/23 01:17:48 INFO mapreduce.Job: map 100% reduce 82%
11/06/23 01:17:57 INFO mapreduce.Job: map 100% reduce 83%
11/06/23 01:18:15 INFO mapreduce.Job: map 100% reduce 84%
11/06/23 01:18:24 INFO mapreduce.Job: map 100% reduce 85%
11/06/23 01:18:33 INFO mapreduce.Job: map 100% reduce 86%
11/06/23 01:18:39 INFO mapreduce.Job: map 100% reduce 87%
11/06/23 01:18:51 INFO mapreduce.Job: map 100% reduce 88%
11/06/23 01:18:57 INFO mapreduce.Job: map 100% reduce 89%
11/06/23 01:19:06 INFO mapreduce.Job: map 100% reduce 90%
11/06/23 01:19:12 INFO mapreduce.Job: map 100% reduce 91%
11/06/23 01:19:24 INFO mapreduce.Job: map 100% reduce 92%
11/06/23 01:19:36 INFO mapreduce.Job: map 100% reduce 93%
11/06/23 01:19:48 INFO mapreduce.Job: map 100% reduce 94%
11/06/23 01:19:57 INFO mapreduce.Job: map 100% reduce 95%
11/06/23 01:20:03 INFO mapreduce.Job: map 100% reduce 96%
11/06/23 01:20:12 INFO mapreduce.Job: map 100% reduce 97%
11/06/23 01:20:21 INFO mapreduce.Job: map 100% reduce 98%
11/06/23 01:20:30 INFO mapreduce.Job: map 100% reduce 99%
11/06/23 01:20:36 INFO mapreduce.Job: map 100% reduce 100%
11/06/23 01:20:41 INFO mapreduce.Job: Job complete: job_201106230057_0001
11/06/23 01:20:41 INFO mapreduce.Job: Counters: 34
FileInputFormatCounters
BYTES_READ=11455340570
FileSystemCounters
FILE_BYTES_READ=4883250060
FILE_BYTES_WRITTEN=6858563424
HDFS_BYTES_READ=1187689797
HDFS_BYTES_WRITTEN=792001694
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
Job Counters
Data-local map tasks=18
Total time spent by all maps waiting after reserving slots
(ms)=0
Total time spent by all reduces waiting after reserving slots
(ms)=0
Rack-local map tasks=3
SLOTS_MILLIS_MAPS=1523846
SLOTS_MILLIS_REDUCES=574976
Launched map tasks=21
Launched reduce tasks=2
Map-Reduce Framework
Combine input records=0
Combine output records=0
Failed Shuffles=0
GC time elapsed (ms)=39117
Map input records=49000000
Map output bytes=1568000000
Map output records=196000000
Merged Map outputs=18
Reduce input groups=49020000
Reduce input records=196000000
Reduce output records=49010000
Reduce shuffle bytes=1960000108
Shuffled Maps =18
Spilled Records=684324952
SPLIT_RAW_BYTES=2070