Thanks for the logs. I see a lot of issues like the following:
2011-12-01 00:04:46,241 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: rainbow-01/192.168.100.1:30004. Already tried 0 time(s).
2011-12-01 00:04:47,243 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: rainbow-01/192.168.100.1:30004. Already tried 1 time(s).
2011-12-01 00:04:48,245 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: rainbow-01/192.168.100.1:30004. Already tried 2 time(s).
2011-12-01 00:04:49,247 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: rainbow-01/192.168.100.1:30004. Already tried 3 time(s).
2011-12-01 00:04:50,249 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: rainbow-01/192.168.100.1:30004. Already tried 4 time(s).
2011-12-01 00:04:51,251 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: rainbow-01/192.168.100.1:30004. Already tried 5 time(s).
2011-12-01 00:04:52,253 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: rainbow-01/192.168.100.1:30004. Already tried 6 time(s).
2011-12-01 00:04:53,255 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: rainbow-01/192.168.100.1:30004. Already tried 7 time(s).
2011-12-01 00:04:54,256 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: rainbow-01/192.168.100.1:30004. Already tried 8 time(s).
2011-12-01 00:04:55,258 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: rainbow-01/192.168.100.1:30004. Already tried 9 time(s).
2011-12-01 00:04:55,261 WARN
org.apache.giraph.comm.BasicRPCCommunications: connectAllRPCProxys:
Failed on attempt 0 of 5 to connect to
(id=0,cur=Worker(hostname=rainbow-01, MRpartition=4,
port=30004),prev=null,ckpt_file=null)
java.net.ConnectException: Call to rainbow-01/192.168.100.1:30004 failed
on connection exception: java.net.ConnectException: Connection refused
Are you sure that your machines can communicate to each other? Are the
ports 30000 and up blocked? And you're right, you should have only had
6 tasks. What version of Hadoop is this on?
Avery
On 12/1/11 2:43 PM, Inci Cetindil wrote:
Hi Avery,
I attached the logs for the first attemps. The weird thing is even if I
specified the number of workers as 5, I had 8 mapper tasks. You can see the
logs for tasks 6 and 7 failed immediately. Do you have any explanation for this
behavior? Normally I should have 6 tasks, right?
Thanks,
Inci
On Dec 1, 2011, at 11:00 AM, Avery Ching wrote:
Hi Inci,
I am not sure what's wrong. I ran the exact same command on a freshly checked
version of Graph without any trouble. Here's my output:
hadoop jar target/giraph-0.70-jar-with-dependencies.jar
org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5
Using org.apache.giraph.benchmark.PageRankBenchmark$PageRankVertex
11/12/01 10:58:05 WARN bsp.BspOutputFormat: checkOutputSpecs:
ImmutableOutputCommiter will not check anything
11/12/01 10:58:05 INFO mapred.JobClient: Running job: job_201112011054_0003
11/12/01 10:58:06 INFO mapred.JobClient: map 0% reduce 0%
11/12/01 10:58:23 INFO mapred.JobClient: map 16% reduce 0%
11/12/01 10:58:35 INFO mapred.JobClient: map 100% reduce 0%
11/12/01 10:58:40 INFO mapred.JobClient: Job complete: job_201112011054_0003
11/12/01 10:58:40 INFO mapred.JobClient: Counters: 31
11/12/01 10:58:40 INFO mapred.JobClient: Job Counters
11/12/01 10:58:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=77566
11/12/01 10:58:40 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
11/12/01 10:58:40 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
11/12/01 10:58:40 INFO mapred.JobClient: Launched map tasks=6
11/12/01 10:58:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
11/12/01 10:58:40 INFO mapred.JobClient: Giraph Timers
11/12/01 10:58:40 INFO mapred.JobClient: Total (milliseconds)=13468
11/12/01 10:58:40 INFO mapred.JobClient: Superstep 3 (milliseconds)=41
11/12/01 10:58:40 INFO mapred.JobClient: Setup (milliseconds)=11691
11/12/01 10:58:40 INFO mapred.JobClient: Shutdown (milliseconds)=73
11/12/01 10:58:40 INFO mapred.JobClient: Vertex input superstep
(milliseconds)=369
11/12/01 10:58:40 INFO mapred.JobClient: Superstep 0 (milliseconds)=674
11/12/01 10:58:40 INFO mapred.JobClient: Superstep 2 (milliseconds)=519
11/12/01 10:58:40 INFO mapred.JobClient: Superstep 1 (milliseconds)=96
11/12/01 10:58:40 INFO mapred.JobClient: Giraph Stats
11/12/01 10:58:40 INFO mapred.JobClient: Aggregate edges=500
11/12/01 10:58:40 INFO mapred.JobClient: Superstep=4
11/12/01 10:58:40 INFO mapred.JobClient: Last checkpointed superstep=2
11/12/01 10:58:40 INFO mapred.JobClient: Current workers=5
11/12/01 10:58:40 INFO mapred.JobClient: Current master task partition=0
11/12/01 10:58:40 INFO mapred.JobClient: Sent messages=0
11/12/01 10:58:40 INFO mapred.JobClient: Aggregate finished vertices=500
11/12/01 10:58:40 INFO mapred.JobClient: Aggregate vertices=500
11/12/01 10:58:40 INFO mapred.JobClient: File Output Format Counters
11/12/01 10:58:40 INFO mapred.JobClient: Bytes Written=0
11/12/01 10:58:40 INFO mapred.JobClient: FileSystemCounters
11/12/01 10:58:40 INFO mapred.JobClient: FILE_BYTES_READ=590
11/12/01 10:58:40 INFO mapred.JobClient: HDFS_BYTES_READ=264
11/12/01 10:58:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=129240
11/12/01 10:58:40 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=55080
11/12/01 10:58:40 INFO mapred.JobClient: File Input Format Counters
11/12/01 10:58:40 INFO mapred.JobClient: Bytes Read=0
11/12/01 10:58:40 INFO mapred.JobClient: Map-Reduce Framework
11/12/01 10:58:40 INFO mapred.JobClient: Map input records=6
11/12/01 10:58:40 INFO mapred.JobClient: Spilled Records=0
11/12/01 10:58:40 INFO mapred.JobClient: Map output records=0
11/12/01 10:58:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=264
Would it be possible to send me the logs from the first attempts for every map
task?
i.e. from
Task attempt_201111302343_0002_m_000000_0
Task attempt_201111302343_0002_m_000001_0
Task attempt_201111302343_0002_m_000002_0
Task attempt_201111302343_0002_m_000003_0
Task attempt_201111302343_0002_m_000004_0
Task attempt_201111302343_0002_m_000005_0
I think that could help us find the issue.
Thanks,
Avery
On 12/1/11 1:17 AM, Inci Cetindil wrote:
Hi,
I'm running PageRank benchmark example on a cluster with 1 master + 5 slave
nodes. I have tried it with a large number of vertices; when I failed I decided
to make it run with 500 vertices and 5 workers first. However, it doesn't work
even for 500 vertices.
I am using the latest version of Giraph from the trunk and running the
following command:
hadoop jar giraph-0.70-jar-with-dependencies.jar
org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500 -w 5
I attached the error message that I am receiving. Please let me know if I am
missing something.
Best regards,
Inci