Hi Avery, I finally succeeded running the benchmark. The problem was not the port; but the IP resolving.
After removing the mapping from 127.0.0.1 to the node names on /etc/hosts files, it worked like a charm! I guess Hadoop has different code path to get what IP it should listen on; so normal Hadoop jobs worked with the previous network configuration. Thanks for your help! Inci On Dec 2, 2011, at 11:06 AM, Avery Ching wrote: > You can actually set the starting RPC port to change it from 30000 by adding > the appropriate configuration (i.e. hadoop jar > giraph-0.70-jar-with-dependencies.jar > org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.rpcInitialPort=<your > starting port> -e 1 -s 3 -v -V 500 -w 5). > > I think I would ensure that those ports are open for communication between on > node in your cluster to another . I don't think that anyone else has run > into this problem yet... > > Since the job does take some time to fail, you might want to start it up and > then try to telnet to its rpc port from another machine in the cluster and > see if that succeeds. > > Hope that helps, > > Avery > > On 12/1/11 11:04 PM, Inci Cetindil wrote: >> I have tried it with various numbers of workers and it only worked with 1 >> worker. >> >> I am not running multiple Giraph jobs at the same time, does it always use >> the ports 30000 and up? I checked the used ports using "netstat" command and >> didn't see any of the ports 30000-30005. >> >> Inci >> >> On Dec 1, 2011, at 7:03 PM, Avery Ching wrote: >> >>> Hmmm...this is unusual. I wonder if it is tired to the weird number of >>> tasks you are getting. Can you try it with various numbers of workers >>> (i.e. 1, 2) and see if it works? >>> >>> To me, the connection refused error indicates that perhaps the server >>> failed to bind to its port (are you running multiple Giraph jobs at the >>> same time) or the server died? >>> >>> Avery >>> >>> On 12/1/11 5:33 PM, Inci Cetindil wrote: >>>> I am sure the machines can communicate to each other and the ports are not >>>> blocked. I can run word count hadoop job without any problem on these >>>> machines. My hadoop version is 0.20.203.0. >>>> >>>> Thanks, >>>> Inci >>>> >>>> On Dec 1, 2011, at 3:57 PM, Avery Ching wrote: >>>> >>>>> Thanks for the logs. I see a lot of issues like the following: >>>>> >>>>> 2011-12-01 00:04:46,241 INFO org.apache.hadoop.ipc.Client: Retrying >>>>> connect to server: rainbow-01/192.168.100.1:30004. Already tried 0 >>>>> time(s). >>>>> 2011-12-01 00:04:47,243 INFO org.apache.hadoop.ipc.Client: Retrying >>>>> connect to server: rainbow-01/192.168.100.1:30004. Already tried 1 >>>>> time(s). >>>>> 2011-12-01 00:04:48,245 INFO org.apache.hadoop.ipc.Client: Retrying >>>>> connect to server: rainbow-01/192.168.100.1:30004. Already tried 2 >>>>> time(s). >>>>> 2011-12-01 00:04:49,247 INFO org.apache.hadoop.ipc.Client: Retrying >>>>> connect to server: rainbow-01/192.168.100.1:30004. Already tried 3 >>>>> time(s). >>>>> 2011-12-01 00:04:50,249 INFO org.apache.hadoop.ipc.Client: Retrying >>>>> connect to server: rainbow-01/192.168.100.1:30004. Already tried 4 >>>>> time(s). >>>>> 2011-12-01 00:04:51,251 INFO org.apache.hadoop.ipc.Client: Retrying >>>>> connect to server: rainbow-01/192.168.100.1:30004. Already tried 5 >>>>> time(s). >>>>> 2011-12-01 00:04:52,253 INFO org.apache.hadoop.ipc.Client: Retrying >>>>> connect to server: rainbow-01/192.168.100.1:30004. Already tried 6 >>>>> time(s). >>>>> 2011-12-01 00:04:53,255 INFO org.apache.hadoop.ipc.Client: Retrying >>>>> connect to server: rainbow-01/192.168.100.1:30004. Already tried 7 >>>>> time(s). >>>>> 2011-12-01 00:04:54,256 INFO org.apache.hadoop.ipc.Client: Retrying >>>>> connect to server: rainbow-01/192.168.100.1:30004. Already tried 8 >>>>> time(s). >>>>> 2011-12-01 00:04:55,258 INFO org.apache.hadoop.ipc.Client: Retrying >>>>> connect to server: rainbow-01/192.168.100.1:30004. Already tried 9 >>>>> time(s). >>>>> 2011-12-01 00:04:55,261 WARN >>>>> org.apache.giraph.comm.BasicRPCCommunications: connectAllRPCProxys: >>>>> Failed on attempt 0 of 5 to connect to >>>>> (id=0,cur=Worker(hostname=rainbow-01, MRpartition=4, >>>>> port=30004),prev=null,ckpt_file=null) >>>>> java.net.ConnectException: Call to rainbow-01/192.168.100.1:30004 failed >>>>> on connection exception: java.net.ConnectException: Connection refused >>>>> >>>>> Are you sure that your machines can communicate to each other? Are the >>>>> ports 30000 and up blocked? And you're right, you should have only had 6 >>>>> tasks. What version of Hadoop is this on? >>>>> >>>>> Avery >>>>> >>>>> On 12/1/11 2:43 PM, Inci Cetindil wrote: >>>>>> Hi Avery, >>>>>> >>>>>> I attached the logs for the first attemps. The weird thing is even if I >>>>>> specified the number of workers as 5, I had 8 mapper tasks. You can see >>>>>> the logs for tasks 6 and 7 failed immediately. Do you have any >>>>>> explanation for this behavior? Normally I should have 6 tasks, right? >>>>>> >>>>>> Thanks, >>>>>> Inci >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Dec 1, 2011, at 11:00 AM, Avery Ching wrote: >>>>>> >>>>>>> Hi Inci, >>>>>>> >>>>>>> I am not sure what's wrong. I ran the exact same command on a freshly >>>>>>> checked version of Graph without any trouble. Here's my output: >>>>>>> >>>>>>> hadoop jar target/giraph-0.70-jar-with-dependencies.jar >>>>>>> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5 >>>>>>> Using org.apache.giraph.benchmark.PageRankBenchmark$PageRankVertex >>>>>>> 11/12/01 10:58:05 WARN bsp.BspOutputFormat: checkOutputSpecs: >>>>>>> ImmutableOutputCommiter will not check anything >>>>>>> 11/12/01 10:58:05 INFO mapred.JobClient: Running job: >>>>>>> job_201112011054_0003 >>>>>>> 11/12/01 10:58:06 INFO mapred.JobClient: map 0% reduce 0% >>>>>>> 11/12/01 10:58:23 INFO mapred.JobClient: map 16% reduce 0% >>>>>>> 11/12/01 10:58:35 INFO mapred.JobClient: map 100% reduce 0% >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Job complete: >>>>>>> job_201112011054_0003 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Counters: 31 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Job Counters >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=77566 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Total time spent by all >>>>>>> reduces waiting after reserving slots (ms)=0 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Total time spent by all >>>>>>> maps waiting after reserving slots (ms)=0 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Launched map tasks=6 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Giraph Timers >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Total (milliseconds)=13468 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep 3 >>>>>>> (milliseconds)=41 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Setup (milliseconds)=11691 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Shutdown (milliseconds)=73 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Vertex input superstep >>>>>>> (milliseconds)=369 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep 0 >>>>>>> (milliseconds)=674 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep 2 >>>>>>> (milliseconds)=519 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep 1 >>>>>>> (milliseconds)=96 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Giraph Stats >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Aggregate edges=500 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Superstep=4 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Last checkpointed >>>>>>> superstep=2 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Current workers=5 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Current master task >>>>>>> partition=0 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Sent messages=0 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Aggregate finished >>>>>>> vertices=500 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Aggregate vertices=500 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: File Output Format Counters >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Bytes Written=0 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: FileSystemCounters >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: FILE_BYTES_READ=590 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: HDFS_BYTES_READ=264 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=129240 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=55080 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: File Input Format Counters >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Bytes Read=0 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Map-Reduce Framework >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Map input records=6 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Spilled Records=0 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Map output records=0 >>>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=264 >>>>>>> >>>>>>> >>>>>>> Would it be possible to send me the logs from the first attempts for >>>>>>> every map task? >>>>>>> >>>>>>> i.e. from >>>>>>> Task attempt_201111302343_0002_m_000000_0 >>>>>>> Task attempt_201111302343_0002_m_000001_0 >>>>>>> Task attempt_201111302343_0002_m_000002_0 >>>>>>> Task attempt_201111302343_0002_m_000003_0 >>>>>>> Task attempt_201111302343_0002_m_000004_0 >>>>>>> Task attempt_201111302343_0002_m_000005_0 >>>>>>> >>>>>>> I think that could help us find the issue. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Avery >>>>>>> >>>>>>> On 12/1/11 1:17 AM, Inci Cetindil wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I'm running PageRank benchmark example on a cluster with 1 master + 5 >>>>>>>> slave nodes. I have tried it with a large number of vertices; when I >>>>>>>> failed I decided to make it run with 500 vertices and 5 workers first. >>>>>>>> However, it doesn't work even for 500 vertices. >>>>>>>> I am using the latest version of Giraph from the trunk and running the >>>>>>>> following command: >>>>>>>> >>>>>>>> hadoop jar giraph-0.70-jar-with-dependencies.jar >>>>>>>> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5 >>>>>>>> >>>>>>>> I attached the error message that I am receiving. Please let me know >>>>>>>> if I am missing something. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Inci >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >
