I am sure the machines can communicate to each other and the ports are not blocked. I can run word count hadoop job without any problem on these machines. My hadoop version is 0.20.203.0.
Thanks, Inci On Dec 1, 2011, at 3:57 PM, Avery Ching wrote: > Thanks for the logs. I see a lot of issues like the following: > > 2011-12-01 00:04:46,241 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: rainbow-01/192.168.100.1:30004. Already tried 0 time(s). > 2011-12-01 00:04:47,243 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: rainbow-01/192.168.100.1:30004. Already tried 1 time(s). > 2011-12-01 00:04:48,245 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: rainbow-01/192.168.100.1:30004. Already tried 2 time(s). > 2011-12-01 00:04:49,247 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: rainbow-01/192.168.100.1:30004. Already tried 3 time(s). > 2011-12-01 00:04:50,249 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: rainbow-01/192.168.100.1:30004. Already tried 4 time(s). > 2011-12-01 00:04:51,251 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: rainbow-01/192.168.100.1:30004. Already tried 5 time(s). > 2011-12-01 00:04:52,253 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: rainbow-01/192.168.100.1:30004. Already tried 6 time(s). > 2011-12-01 00:04:53,255 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: rainbow-01/192.168.100.1:30004. Already tried 7 time(s). > 2011-12-01 00:04:54,256 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: rainbow-01/192.168.100.1:30004. Already tried 8 time(s). > 2011-12-01 00:04:55,258 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: rainbow-01/192.168.100.1:30004. Already tried 9 time(s). > 2011-12-01 00:04:55,261 WARN org.apache.giraph.comm.BasicRPCCommunications: > connectAllRPCProxys: Failed on attempt 0 of 5 to connect to > (id=0,cur=Worker(hostname=rainbow-01, MRpartition=4, > port=30004),prev=null,ckpt_file=null) > java.net.ConnectException: Call to rainbow-01/192.168.100.1:30004 failed on > connection exception: java.net.ConnectException: Connection refused > > Are you sure that your machines can communicate to each other? Are the ports > 30000 and up blocked? And you're right, you should have only had 6 tasks. > What version of Hadoop is this on? > > Avery > > On 12/1/11 2:43 PM, Inci Cetindil wrote: >> >> Hi Avery, >> >> I attached the logs for the first attemps. The weird thing is even if I >> specified the number of workers as 5, I had 8 mapper tasks. You can see the >> logs for tasks 6 and 7 failed immediately. Do you have any explanation for >> this behavior? Normally I should have 6 tasks, right? >> >> Thanks, >> Inci >> >> >> >> >> On Dec 1, 2011, at 11:00 AM, Avery Ching wrote: >> >>> Hi Inci, >>> >>> I am not sure what's wrong. I ran the exact same command on a freshly >>> checked version of Graph without any trouble. Here's my output: >>> >>> hadoop jar target/giraph-0.70-jar-with-dependencies.jar >>> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5 >>> Using org.apache.giraph.benchmark.PageRankBenchmark$PageRankVertex >>> 11/12/01 10:58:05 WARN bsp.BspOutputFormat: checkOutputSpecs: >>> ImmutableOutputCommiter will not check anything >>> 11/12/01 10:58:05 INFO mapred.JobClient: Running job: job_201112011054_0003 >>> 11/12/01 10:58:06 INFO mapred.JobClient: map 0% reduce 0% >>> 11/12/01 10:58:23 INFO mapred.JobClient: map 16% reduce 0% >>> 11/12/01 10:58:35 INFO mapred.JobClient: map 100% reduce 0% >>> 11/12/01 10:58:40 INFO mapred.JobClient: Job complete: job_201112011054_0003 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Counters: 31 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Job Counters >>> 11/12/01 10:58:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=77566 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Total time spent by all >>> reduces waiting after reserving slots (ms)=0 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Total time spent by all maps >>> waiting after reserving slots (ms)=0 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Launched map tasks=6 >>> 11/12/01 10:58:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Giraph Timers >>> 11/12/01 10:58:40 INFO mapred.JobClient: Total (milliseconds)=13468 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep 3 (milliseconds)=41 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Setup (milliseconds)=11691 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Shutdown (milliseconds)=73 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Vertex input superstep >>> (milliseconds)=369 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep 0 (milliseconds)=674 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep 2 (milliseconds)=519 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep 1 (milliseconds)=96 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Giraph Stats >>> 11/12/01 10:58:40 INFO mapred.JobClient: Aggregate edges=500 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep=4 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Last checkpointed superstep=2 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Current workers=5 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Current master task partition=0 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Sent messages=0 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Aggregate finished vertices=500 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Aggregate vertices=500 >>> 11/12/01 10:58:40 INFO mapred.JobClient: File Output Format Counters >>> 11/12/01 10:58:40 INFO mapred.JobClient: Bytes Written=0 >>> 11/12/01 10:58:40 INFO mapred.JobClient: FileSystemCounters >>> 11/12/01 10:58:40 INFO mapred.JobClient: FILE_BYTES_READ=590 >>> 11/12/01 10:58:40 INFO mapred.JobClient: HDFS_BYTES_READ=264 >>> 11/12/01 10:58:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=129240 >>> 11/12/01 10:58:40 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=55080 >>> 11/12/01 10:58:40 INFO mapred.JobClient: File Input Format Counters >>> 11/12/01 10:58:40 INFO mapred.JobClient: Bytes Read=0 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Map-Reduce Framework >>> 11/12/01 10:58:40 INFO mapred.JobClient: Map input records=6 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Spilled Records=0 >>> 11/12/01 10:58:40 INFO mapred.JobClient: Map output records=0 >>> 11/12/01 10:58:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=264 >>> >>> >>> Would it be possible to send me the logs from the first attempts for every >>> map task? >>> >>> i.e. from >>> Task attempt_201111302343_0002_m_000000_0 >>> Task attempt_201111302343_0002_m_000001_0 >>> Task attempt_201111302343_0002_m_000002_0 >>> Task attempt_201111302343_0002_m_000003_0 >>> Task attempt_201111302343_0002_m_000004_0 >>> Task attempt_201111302343_0002_m_000005_0 >>> >>> I think that could help us find the issue. >>> >>> Thanks, >>> >>> Avery >>> >>> On 12/1/11 1:17 AM, Inci Cetindil wrote: >>>> Hi, >>>> >>>> I'm running PageRank benchmark example on a cluster with 1 master + 5 >>>> slave nodes. I have tried it with a large number of vertices; when I >>>> failed I decided to make it run with 500 vertices and 5 workers first. >>>> However, it doesn't work even for 500 vertices. >>>> I am using the latest version of Giraph from the trunk and running the >>>> following command: >>>> >>>> hadoop jar giraph-0.70-jar-with-dependencies.jar >>>> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5 >>>> >>>> I attached the error message that I am receiving. Please let me know if I >>>> am missing something. >>>> >>>> Best regards, >>>> Inci >>>> >>>> >>>> >>>> >>>> >