On Thu, Oct 1, 2009 at 9:30 PM, Taylor, Ronald C <[email protected]>wrote:
> 1) yes, we are planning on switching to 0.20. Just haven't yet. So - > that might be the first thing to do. > > If you can, please do. Big difference and 0.19.x is oh so six months old by now. > 2) re the # of reducers: at the start of my run fn, just after defining > a jobConf object, I do a > jobConf.setNumReduceTasks(2) > > > Wasn't sure if that setting was per node or for the entire 10-node > cluster, so I also tried > jobConf.setNumReduceTask(19) > > Didn't make any difference - program still failed at 66% > > How many are running when you change the above? 2 in first case and 19 in second? Its a small table? Or a new table? They might be beating up on one region. Better in 0.20.0. > > > 4) re the debugging suggestions: noted, and I'll see what I can do. > > Thanks for the quick reply. I leave on a trip tomorrow morn, back next > Thursday, so - I'll be working on this as soon as I get back. > Good stuff. We'll be here when you get back Thurs. Go easy, St.Ack > Ron > > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of > stack > Sent: Thursday, October 01, 2009 9:14 PM > To: [email protected] > Subject: Re: Got a problem using Hbase as a MR sink - program hangs in > the reduce part > > Can you run 0.20.0? > > 66% is when it starts writing hbase. > > How many reducers? > > Enable DEBUG (see FAQ for how). > > These are odd in that they are saying that the reduce task was dead -- > no progress reported -- over ten whole minutes: > > attempt_200908131056_0004_r_ > > > > 000000_1 failed to report status for 603 seconds. Killing! > > > > Can you find that task in the MapReduce UI and see what was going on? > > You've read the 'Getting Started' where it talks about upping file > descriptors, xceivers, and applying the HDFS-127 patch to your hadoop > cluster? > > Yours, > St.Ack > > > > On Thu, Oct 1, 2009 at 5:24 PM, Taylor, Ronald C > <[email protected]>wrote: > > > > > Hi folks, > > > > I am trying to run a simple MapReduce program that sums the number of > > entries in a list in a column in an Hbase table and then places that > > sum back into the table. Simple task, in theory - I am just trying out > > > MapReduce programming combined with Hbase use, i.e., using an Hbase > > table as a data source and as a sink for the output. > > > > So - I get the screen error output below. The program fails at 66% > > into reduce. Don't know why - I have rerun it and it fails at the same > point. > > I am doing this on a 10-node Linux cluster using Hadoop 0.19.1 and > > Hbase 0.19.3. > > > > I don't see any clues in the master Hbase and Hadoop logs. There are > > no errors are reported that I can see - though I cheerfully admit to > > being a complete novice at interpreting the log output. > > > > I'm hoping this is something simple - perhaps some parameter I forgot > > to set? I am hoping the screen output below might provide guidance to > > somebody with more experience. Could very much use some help. > > > > - Ron Taylor > > > > ___________________________________________ > > Ronald Taylor, Ph.D. > > Computational Biology & Bioinformatics Group Pacific Northwest > > National Laboratory > > 902 Battelle Boulevard > > P.O. Box 999, MSIN K7-90 > > Richland, WA 99352 USA > > Office: 509-372-6568 > > Email: [email protected] > > www.pnl.gov > > > > %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > > > > Working in this directory: > > > > had...@neptune:/share/apps/RonWork/MR > > > > Command issued: > > > > /share/apps/hadoop/hadoop-0.19.1/bin/hadoop jar > > jarredBinTableMRSummation.jar binTableMRSummation > > > > Screen output: > > > > 09/10/01 16:24:53 WARN mapred.JobClient: Use GenericOptionsParser for > > parsing the arguments. Applications should implement Tool for the > same. > > 09/10/01 16:24:53 INFO mapred.TableInputFormatBase: split: > > 0->compute-0-0.local:, > > 09/10/01 16:24:54 INFO mapred.JobClient: Running job: > > job_200908131056_0004 > > 09/10/01 16:24:55 INFO mapred.JobClient: map 0% reduce 0% > > 09/10/01 16:25:27 INFO mapred.JobClient: map 100% reduce 0% > > 09/10/01 16:25:38 INFO mapred.JobClient: map 100% reduce 33% > > 09/10/01 16:25:43 INFO mapred.JobClient: map 100% reduce 66% > > 09/10/01 16:35:40 INFO mapred.JobClient: map 100% reduce 33% > > 09/10/01 16:35:41 INFO mapred.JobClient: Task Id : > > attempt_200908131056_0004_r_000000_0, Status : FAILED Task > > attempt_200908131056_0004_r_000000_0 failed to report status for 603 > > seconds. Killing! > > 09/10/01 16:35:46 INFO mapred.JobClient: map 100% reduce 0% > > 09/10/01 16:35:46 INFO mapred.JobClient: Task Id : > > attempt_200908131056_0004_r_000001_0, Status : FAILED Task > > attempt_200908131056_0004_r_000001_0 failed to report status for 602 > > seconds. Killing! > > 09/10/01 16:35:51 INFO mapred.JobClient: map 100% reduce 33% > > 09/10/01 16:35:56 INFO mapred.JobClient: map 100% reduce 66% > > 09/10/01 16:45:55 INFO mapred.JobClient: map 100% reduce 33% > > 09/10/01 16:45:55 INFO mapred.JobClient: Task Id : > > attempt_200908131056_0004_r_000000_1, Status : FAILED Task > > attempt_200908131056_0004_r_000000_1 failed to report status for 603 > > seconds. Killing! > > 09/10/01 16:45:55 INFO mapred.JobClient: Task Id : > > attempt_200908131056_0004_r_000000_2, Status : FAILED Task > > attempt_200908131056_0004_r_000000_2 failed to report status for 603 > > seconds. Killing! > > 09/10/01 16:46:00 INFO mapred.JobClient: Task Id : > > attempt_200908131056_0004_r_000001_1, Status : FAILED Task > > attempt_200908131056_0004_r_000001_1 failed to report status for 603 > > seconds. Killing! > > 09/10/01 16:46:06 INFO mapred.JobClient: Task Id : > > attempt_200908131056_0004_r_000001_2, Status : FAILED Task > > attempt_200908131056_0004_r_000001_2 failed to report status for 603 > > seconds. Killing! > > > > <manually killed via control-C at this point> > > > > 09/10/01 16:46:15 INFO mapred.JobClient: map 100% reduce 66% Killed > > by signal 2. > > > > > > >
