I have tried it with various numbers of workers and it only worked with 1 
worker.

I am not running multiple Giraph jobs at the same time, does it always use the 
ports 30000 and up? I checked the used ports using "netstat" command and didn't 
see any of the ports 30000-30005.

Inci

On Dec 1, 2011, at 7:03 PM, Avery Ching wrote:

> Hmmm...this is unusual.  I wonder if it is tired to the weird number of tasks 
> you are getting.  Can you try it with various numbers of workers (i.e. 1, 2) 
> and see if it works?
> 
> To me, the connection refused error indicates that perhaps the server failed 
> to bind to its port (are you running multiple Giraph jobs at the same time) 
> or the server died?
> 
> Avery
> 
> On 12/1/11 5:33 PM, Inci Cetindil wrote:
>> I am sure the machines can communicate to each other and the ports are not 
>> blocked. I can run word count hadoop job without any problem on these 
>> machines. My hadoop version is 0.20.203.0.
>> 
>> Thanks,
>> Inci
>> 
>> On Dec 1, 2011, at 3:57 PM, Avery Ching wrote:
>> 
>>> Thanks for the logs.  I see a lot of issues like the following:
>>> 
>>> 2011-12-01 00:04:46,241 INFO org.apache.hadoop.ipc.Client: Retrying connect 
>>> to server: rainbow-01/192.168.100.1:30004. Already tried 0 time(s).
>>> 2011-12-01 00:04:47,243 INFO org.apache.hadoop.ipc.Client: Retrying connect 
>>> to server: rainbow-01/192.168.100.1:30004. Already tried 1 time(s).
>>> 2011-12-01 00:04:48,245 INFO org.apache.hadoop.ipc.Client: Retrying connect 
>>> to server: rainbow-01/192.168.100.1:30004. Already tried 2 time(s).
>>> 2011-12-01 00:04:49,247 INFO org.apache.hadoop.ipc.Client: Retrying connect 
>>> to server: rainbow-01/192.168.100.1:30004. Already tried 3 time(s).
>>> 2011-12-01 00:04:50,249 INFO org.apache.hadoop.ipc.Client: Retrying connect 
>>> to server: rainbow-01/192.168.100.1:30004. Already tried 4 time(s).
>>> 2011-12-01 00:04:51,251 INFO org.apache.hadoop.ipc.Client: Retrying connect 
>>> to server: rainbow-01/192.168.100.1:30004. Already tried 5 time(s).
>>> 2011-12-01 00:04:52,253 INFO org.apache.hadoop.ipc.Client: Retrying connect 
>>> to server: rainbow-01/192.168.100.1:30004. Already tried 6 time(s).
>>> 2011-12-01 00:04:53,255 INFO org.apache.hadoop.ipc.Client: Retrying connect 
>>> to server: rainbow-01/192.168.100.1:30004. Already tried 7 time(s).
>>> 2011-12-01 00:04:54,256 INFO org.apache.hadoop.ipc.Client: Retrying connect 
>>> to server: rainbow-01/192.168.100.1:30004. Already tried 8 time(s).
>>> 2011-12-01 00:04:55,258 INFO org.apache.hadoop.ipc.Client: Retrying connect 
>>> to server: rainbow-01/192.168.100.1:30004. Already tried 9 time(s).
>>> 2011-12-01 00:04:55,261 WARN org.apache.giraph.comm.BasicRPCCommunications: 
>>> connectAllRPCProxys:     Failed on attempt 0 of 5 to connect to 
>>> (id=0,cur=Worker(hostname=rainbow-01, MRpartition=4, 
>>> port=30004),prev=null,ckpt_file=null)
>>> java.net.ConnectException: Call to rainbow-01/192.168.100.1:30004 failed on 
>>> connection exception: java.net.ConnectException: Connection refused
>>> 
>>> Are you sure that your machines can communicate to each other?  Are the 
>>> ports 30000 and up blocked?  And you're right, you should have only had 6 
>>> tasks.  What version of Hadoop is this on?
>>> 
>>> Avery
>>> 
>>> On 12/1/11 2:43 PM, Inci Cetindil wrote:
>>>> Hi Avery,
>>>> 
>>>> I attached the logs for the first attemps. The weird thing is even if I 
>>>> specified the number of workers as 5, I had 8 mapper tasks. You can see 
>>>> the logs for tasks 6 and 7 failed immediately. Do you have any explanation 
>>>> for this behavior? Normally I should have 6 tasks, right?
>>>> 
>>>> Thanks,
>>>> Inci
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Dec 1, 2011, at 11:00 AM, Avery Ching wrote:
>>>> 
>>>>> Hi Inci,
>>>>> 
>>>>> I am not sure what's wrong.  I ran the exact same command on a freshly 
>>>>> checked version of Graph without any trouble.  Here's my output:
>>>>> 
>>>>> hadoop jar target/giraph-0.70-jar-with-dependencies.jar 
>>>>> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5
>>>>> Using org.apache.giraph.benchmark.PageRankBenchmark$PageRankVertex
>>>>> 11/12/01 10:58:05 WARN bsp.BspOutputFormat: checkOutputSpecs: 
>>>>> ImmutableOutputCommiter will not check anything
>>>>> 11/12/01 10:58:05 INFO mapred.JobClient: Running job: 
>>>>> job_201112011054_0003
>>>>> 11/12/01 10:58:06 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 11/12/01 10:58:23 INFO mapred.JobClient:  map 16% reduce 0%
>>>>> 11/12/01 10:58:35 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Job complete: 
>>>>> job_201112011054_0003
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient: Counters: 31
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Job Counters
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=77566
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Total time spent by all 
>>>>> reduces waiting after reserving slots (ms)=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Total time spent by all maps 
>>>>> waiting after reserving slots (ms)=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Launched map tasks=6
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Giraph Timers
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Total (milliseconds)=13468
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 3 (milliseconds)=41
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Setup (milliseconds)=11691
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Shutdown (milliseconds)=73
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Vertex input superstep 
>>>>> (milliseconds)=369
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 0 
>>>>> (milliseconds)=674
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 2 
>>>>> (milliseconds)=519
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 1 (milliseconds)=96
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Giraph Stats
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate edges=500
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep=4
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Last checkpointed superstep=2
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Current workers=5
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Current master task 
>>>>> partition=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Sent messages=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate finished 
>>>>> vertices=500
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate vertices=500
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   File Output Format Counters
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Bytes Written=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   FileSystemCounters
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     FILE_BYTES_READ=590
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     HDFS_BYTES_READ=264
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=129240
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=55080
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   File Input Format Counters
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Bytes Read=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Map input records=6
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Spilled Records=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Map output records=0
>>>>> 11/12/01 10:58:40 INFO mapred.JobClient:     SPLIT_RAW_BYTES=264
>>>>> 
>>>>> 
>>>>> Would it be possible to send me the logs from the first attempts for 
>>>>> every map task?
>>>>> 
>>>>> i.e. from
>>>>> Task attempt_201111302343_0002_m_000000_0
>>>>> Task attempt_201111302343_0002_m_000001_0
>>>>> Task attempt_201111302343_0002_m_000002_0
>>>>> Task attempt_201111302343_0002_m_000003_0
>>>>> Task attempt_201111302343_0002_m_000004_0
>>>>> Task attempt_201111302343_0002_m_000005_0
>>>>> 
>>>>> I think that could help us find the issue.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Avery
>>>>> 
>>>>> On 12/1/11 1:17 AM, Inci Cetindil wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> I'm running PageRank benchmark example on a cluster with 1 master + 5 
>>>>>> slave nodes. I have tried it with a large number of vertices; when I 
>>>>>> failed I decided to make it run with 500 vertices and 5 workers first.  
>>>>>> However, it doesn't work even for 500 vertices.
>>>>>> I am using the latest version of Giraph from the trunk and running the 
>>>>>> following command:
>>>>>> 
>>>>>> hadoop jar giraph-0.70-jar-with-dependencies.jar 
>>>>>> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5
>>>>>> 
>>>>>> I attached the error message that I am receiving. Please let me know if 
>>>>>> I am missing something.
>>>>>> 
>>>>>> Best regards,
>>>>>> Inci
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
> 

Reply via email to