may be running on different input tell us if this is map reduce / data problem thanks, lohit
----- Original Message ---- From: Ted Dunning <[EMAIL PROTECTED]> To: hadoop-user@lucene.apache.org Sent: Friday, January 18, 2008 9:04:37 AM Subject: Re: Hadoop only processing the first 64 meg block of a 2 gig file Look at the map/reduce control panel on the web to look at your map tasks. If you drill all the way down, you can look at the output from the tasks. There is a good chance that your map task is exiting abnormally. On 1/18/08 8:37 AM, "Matt Herndon" <[EMAIL PROTECTED]> wrote: > Yep, I can see all 34 blocks and view chunks of actual data from each > using the web interface (quite a nifty tool). Any other suggestions? > > --Matt > > -----Original Message----- > From: Ted Dunning [mailto:[EMAIL PROTECTED] > Sent: Friday, January 18, 2008 11:23 AM > To: hadoop-user@lucene.apache.org > Subject: Re: Hadoop only processing the first 64 meg block of a 2 gig > file > > > Go into the web interface and look at the file. > > See if you can see all of the blocks. > > > On 1/18/08 7:46 AM, "Matt Herndon" <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> >> >> I'm trying to get Hadoop to process a 2 gig file but it seems to only > be >> processing the first block. I'm running the exact Hadoop vmware image >> that is available here http://dl.google.com/edutools/hadoop-vmware.zip >> without any tweaks or modifications to it. I think my file has been >> properly loaded into HDFS (hdfs reports it as having 2270607035 > bytes) >> but when I run the example wordcount task it only seems to operate on >> the first 64 meg chunk (Map input bytes is reported as 67239230 when > the >> job completes). Is the image setup to only run the first block, and > if >> so how to I change this so it runs over the whole file? Any help > would >> be greatly appreciated. >> >> >> >> Thanks, >> >> >> >> --Matt >> >> >> >> P.S. Here are the commands I've actually run to verify that the file > is >> in the hdfs and to run the wordcount example along with their output: >> >> >> >> hadoop dfs -ls /clickdir >> >> Found 1 items >> >> /clickdir/cf709.txt <r 1> 2270607035 >> >> >> >> hadoop jar hadoop-examples.jar wordcount /clickdir /wordTEST3 >> >> 08/01/18 00:18:59 INFO mapred.FileInputFormat: Total input paths to >> process : 1 >> >> 08/01/18 00:19:00 INFO mapred.JobClient: Running job: job_0023 >> >> 08/01/18 00:19:01 INFO mapred.JobClient: map 0% reduce 0% >> >> 08/01/18 00:19:28 INFO mapred.JobClient: map 2% reduce 0% >> >> 08/01/18 00:19:34 INFO mapred.JobClient: map 3% reduce 0% >> >> 08/01/18 00:19:37 INFO mapred.JobClient: map 5% reduce 0% >> >> 08/01/18 00:19:43 INFO mapred.JobClient: map 6% reduce 1% >> >> 08/01/18 00:19:45 INFO mapred.JobClient: map 9% reduce 1% >> >> 08/01/18 00:19:54 INFO mapred.JobClient: map 12% reduce 2% >> >> 08/01/18 00:20:02 INFO mapred.JobClient: map 15% reduce 3% >> >> 08/01/18 00:20:11 INFO mapred.JobClient: map 18% reduce 4% >> >> 08/01/18 00:20:19 INFO mapred.JobClient: map 21% reduce 4% >> >> 08/01/18 00:20:25 INFO mapred.JobClient: map 21% reduce 6% >> >> 08/01/18 00:20:26 INFO mapred.JobClient: map 24% reduce 6% >> >> 08/01/18 00:20:34 INFO mapred.JobClient: map 27% reduce 7% >> >> 08/01/18 00:20:45 INFO mapred.JobClient: map 27% reduce 8% >> >> 08/01/18 00:20:46 INFO mapred.JobClient: map 30% reduce 8% >> >> 08/01/18 00:20:54 INFO mapred.JobClient: map 33% reduce 8% >> >> 08/01/18 00:20:56 INFO mapred.JobClient: map 33% reduce 9% >> >> 08/01/18 00:21:03 INFO mapred.JobClient: map 36% reduce 10% >> >> 08/01/18 00:21:11 INFO mapred.JobClient: map 39% reduce 11% >> >> 08/01/18 00:21:19 INFO mapred.JobClient: map 41% reduce 12% >> >> 08/01/18 00:21:25 INFO mapred.JobClient: map 44% reduce 13% >> >> 08/01/18 00:21:31 INFO mapred.JobClient: map 47% reduce 13% >> >> 08/01/18 00:21:36 INFO mapred.JobClient: map 50% reduce 14% >> >> 08/01/18 00:21:42 INFO mapred.JobClient: map 53% reduce 16% >> >> 08/01/18 00:21:47 INFO mapred.JobClient: map 56% reduce 16% >> >> 08/01/18 00:21:52 INFO mapred.JobClient: map 59% reduce 17% >> >> 08/01/18 00:21:56 INFO mapred.JobClient: map 62% reduce 18% >> >> 08/01/18 00:22:01 INFO mapred.JobClient: map 65% reduce 19% >> >> 08/01/18 00:22:06 INFO mapred.JobClient: map 68% reduce 20% >> >> 08/01/18 00:22:11 INFO mapred.JobClient: map 71% reduce 20% >> >> 08/01/18 00:22:15 INFO mapred.JobClient: map 74% reduce 22% >> >> 08/01/18 00:22:20 INFO mapred.JobClient: map 77% reduce 24% >> >> 08/01/18 00:22:25 INFO mapred.JobClient: map 80% reduce 24% >> >> 08/01/18 00:22:30 INFO mapred.JobClient: map 83% reduce 25% >> >> 08/01/18 00:22:35 INFO mapred.JobClient: map 86% reduce 27% >> >> 08/01/18 00:22:40 INFO mapred.JobClient: map 89% reduce 28% >> >> 08/01/18 00:22:45 INFO mapred.JobClient: map 89% reduce 29% >> >> 08/01/18 00:22:46 INFO mapred.JobClient: map 91% reduce 29% >> >> 08/01/18 00:22:51 INFO mapred.JobClient: map 94% reduce 30% >> >> 08/01/18 00:22:56 INFO mapred.JobClient: map 97% reduce 30% >> >> 08/01/18 00:23:06 INFO mapred.JobClient: map 98% reduce 32% >> >> 08/01/18 00:25:06 INFO mapred.JobClient: map 99% reduce 32% >> >> 08/01/18 00:26:16 INFO mapred.JobClient: map 100% reduce 32% >> >> 08/01/18 00:27:08 INFO mapred.JobClient: map 100% reduce 66% >> >> 08/01/18 00:27:16 INFO mapred.JobClient: map 100% reduce 71% >> >> 08/01/18 00:27:27 INFO mapred.JobClient: map 100% reduce 77% >> >> 08/01/18 00:27:28 INFO mapred.JobClient: map 100% reduce 78% >> >> 08/01/18 00:27:37 INFO mapred.JobClient: map 100% reduce 100% >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Job complete: job_0023 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Counters: 11 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: >> org.apache.hadoop.examples.WordCount$Counter >> >> 08/01/18 00:27:38 INFO mapred.JobClient: WORDS=13050362 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: VALUES=13976767 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map-Reduce Framework >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map input records=277434 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map output > records=13050362 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map input bytes=67239230 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Map output > bytes=118620427 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Combine input >> records=13050362 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Combine output >> records=926405 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Reduce input > groups=709097 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Reduce input > records=926405 >> >> 08/01/18 00:27:38 INFO mapred.JobClient: Reduce output >> records=709097 >> >