Re: PageRankBenchmrk fails due to IllegalStateException

Inci Cetindil Thu, 01 Dec 2011 17:34:23 -0800

I am sure the machines can communicate to each other and the ports are not 
blocked. I can run word count hadoop job without any problem on these machines. 
My hadoop version is 0.20.203.0.


Thanks,
Inci

On Dec 1, 2011, at 3:57 PM, Avery Ching wrote:

> Thanks for the logs.  I see a lot of issues like the following:
> 
> 2011-12-01 00:04:46,241 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: rainbow-01/192.168.100.1:30004. Already tried 0 time(s).
> 2011-12-01 00:04:47,243 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: rainbow-01/192.168.100.1:30004. Already tried 1 time(s).
> 2011-12-01 00:04:48,245 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: rainbow-01/192.168.100.1:30004. Already tried 2 time(s).
> 2011-12-01 00:04:49,247 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: rainbow-01/192.168.100.1:30004. Already tried 3 time(s).
> 2011-12-01 00:04:50,249 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: rainbow-01/192.168.100.1:30004. Already tried 4 time(s).
> 2011-12-01 00:04:51,251 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: rainbow-01/192.168.100.1:30004. Already tried 5 time(s).
> 2011-12-01 00:04:52,253 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: rainbow-01/192.168.100.1:30004. Already tried 6 time(s).
> 2011-12-01 00:04:53,255 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: rainbow-01/192.168.100.1:30004. Already tried 7 time(s).
> 2011-12-01 00:04:54,256 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: rainbow-01/192.168.100.1:30004. Already tried 8 time(s).
> 2011-12-01 00:04:55,258 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: rainbow-01/192.168.100.1:30004. Already tried 9 time(s).
> 2011-12-01 00:04:55,261 WARN org.apache.giraph.comm.BasicRPCCommunications: 
> connectAllRPCProxys:     Failed on attempt 0 of 5 to connect to 
> (id=0,cur=Worker(hostname=rainbow-01, MRpartition=4, 
> port=30004),prev=null,ckpt_file=null)
> java.net.ConnectException: Call to rainbow-01/192.168.100.1:30004 failed on 
> connection exception: java.net.ConnectException: Connection refused
> 
> Are you sure that your machines can communicate to each other?  Are the ports 
> 30000 and up blocked?  And you're right, you should have only had 6 tasks.  
> What version of Hadoop is this on?
> 
> Avery
> 
> On 12/1/11 2:43 PM, Inci Cetindil wrote:
>> 
>> Hi Avery,
>> 
>> I attached the logs for the first attemps. The weird thing is even if I 
>> specified the number of workers as 5, I had 8 mapper tasks. You can see the 
>> logs for tasks 6 and 7 failed immediately. Do you have any explanation for 
>> this behavior? Normally I should have 6 tasks, right?
>> 
>> Thanks,
>> Inci
>> 
>> 
>> 
>> 
>> On Dec 1, 2011, at 11:00 AM, Avery Ching wrote:
>> 
>>> Hi Inci,
>>> 
>>> I am not sure what's wrong.  I ran the exact same command on a freshly 
>>> checked version of Graph without any trouble.  Here's my output:
>>> 
>>> hadoop jar target/giraph-0.70-jar-with-dependencies.jar 
>>> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5
>>> Using org.apache.giraph.benchmark.PageRankBenchmark$PageRankVertex
>>> 11/12/01 10:58:05 WARN bsp.BspOutputFormat: checkOutputSpecs: 
>>> ImmutableOutputCommiter will not check anything
>>> 11/12/01 10:58:05 INFO mapred.JobClient: Running job: job_201112011054_0003
>>> 11/12/01 10:58:06 INFO mapred.JobClient:  map 0% reduce 0%
>>> 11/12/01 10:58:23 INFO mapred.JobClient:  map 16% reduce 0%
>>> 11/12/01 10:58:35 INFO mapred.JobClient:  map 100% reduce 0%
>>> 11/12/01 10:58:40 INFO mapred.JobClient: Job complete: job_201112011054_0003
>>> 11/12/01 10:58:40 INFO mapred.JobClient: Counters: 31
>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Job Counters 
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=77566
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Total time spent by all 
>>> reduces waiting after reserving slots (ms)=0
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Total time spent by all maps 
>>> waiting after reserving slots (ms)=0
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Launched map tasks=6
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Giraph Timers
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Total (milliseconds)=13468
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 3 (milliseconds)=41
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Setup (milliseconds)=11691
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Shutdown (milliseconds)=73
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Vertex input superstep 
>>> (milliseconds)=369
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 0 (milliseconds)=674
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 2 (milliseconds)=519
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep 1 (milliseconds)=96
>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Giraph Stats
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate edges=500
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Superstep=4
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Last checkpointed superstep=2
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Current workers=5
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Current master task partition=0
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Sent messages=0
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate finished vertices=500
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Aggregate vertices=500
>>> 11/12/01 10:58:40 INFO mapred.JobClient:   File Output Format Counters 
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Bytes Written=0
>>> 11/12/01 10:58:40 INFO mapred.JobClient:   FileSystemCounters
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     FILE_BYTES_READ=590
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     HDFS_BYTES_READ=264
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=129240
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=55080
>>> 11/12/01 10:58:40 INFO mapred.JobClient:   File Input Format Counters 
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Bytes Read=0
>>> 11/12/01 10:58:40 INFO mapred.JobClient:   Map-Reduce Framework
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Map input records=6
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Spilled Records=0
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     Map output records=0
>>> 11/12/01 10:58:40 INFO mapred.JobClient:     SPLIT_RAW_BYTES=264
>>> 
>>> 
>>> Would it be possible to send me the logs from the first attempts for every 
>>> map task?
>>> 
>>> i.e. from 
>>> Task attempt_201111302343_0002_m_000000_0
>>> Task attempt_201111302343_0002_m_000001_0
>>> Task attempt_201111302343_0002_m_000002_0
>>> Task attempt_201111302343_0002_m_000003_0
>>> Task attempt_201111302343_0002_m_000004_0
>>> Task attempt_201111302343_0002_m_000005_0
>>> 
>>> I think that could help us find the issue.
>>> 
>>> Thanks,
>>> 
>>> Avery
>>> 
>>> On 12/1/11 1:17 AM, Inci Cetindil wrote:
>>>> Hi,
>>>> 
>>>> I'm running PageRank benchmark example on a cluster with 1 master + 5 
>>>> slave nodes. I have tried it with a large number of vertices; when I 
>>>> failed I decided to make it run with 500 vertices and 5 workers first.  
>>>> However, it doesn't work even for 500 vertices. 
>>>> I am using the latest version of Giraph from the trunk and running the 
>>>> following command:
>>>> 
>>>> hadoop jar giraph-0.70-jar-with-dependencies.jar 
>>>> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5
>>>> 
>>>> I attached the error message that I am receiving. Please let me know if I 
>>>> am missing something.
>>>> 
>>>> Best regards,
>>>> Inci
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>

Re: PageRankBenchmrk fails due to IllegalStateException

Reply via email to