Hello, 

I have a more general regarding the number of workers: 

How does the number of workers relate to the utilisation of the CPU cores on 
which the Giraph job is run ? 

In my situation, if I start the job with 10 workers or 20 or 30 workers, it 
takes about the same time to finish. 
Also the CPUs are used in the same way, which is to say that about 75% of the 
machine is idle independently of the number of workers. 

Is there another way to make Giraph utilise the CPU cores better ? 

Is there a general explanation for this behaviour of Giraph ? 

If not, then maybe there is an explanation which is specific to my job, the 
hadoop setup and the hardware: 
* I use about 6.5 GB of input data
* Importing the input data takes between 11 and 14 minutes (See below the 
vertex input superstep timing) 
* The hadoop setup is a single node / pseudo distributed hadoop 1.0.1 
installation
* The machine has 24 cores and 120 GB of RAM, and runs (some form of) Linux

Is it possible that the Vertex input could parallelised in a better way in 
Giraph ? 

cheers, Benjamin. 

12/04/02 13:47:29 INFO mapred.JobClient:   Giraph Timers
12/04/02 13:47:29 INFO mapred.JobClient:     Total (milliseconds)=828120
12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 3 (milliseconds)=11320
12/04/02 13:47:29 INFO mapred.JobClient:     Setup (milliseconds)=51446
12/04/02 13:47:29 INFO mapred.JobClient:     Shutdown (milliseconds)=14343
12/04/02 13:47:29 INFO mapred.JobClient:     Vertex input superstep 
(milliseconds)=682368  -> ~11minutes
12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 0 (milliseconds)=17495 
-> 17 seconds
12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 4 (milliseconds)=22737 
-> 22 seconds
12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 2 (milliseconds)=23395 
-> 23 seconds
12/04/02 13:47:29 INFO mapred.JobClient:     Superstep 1 (milliseconds)=5013 -> 
5 seconds
12/04/02 13:47:29 INFO mapred.JobClient:   Giraph Stats
12/04/02 13:47:29 INFO mapred.JobClient:     Aggregate edges=61475601
12/04/02 13:47:29 INFO mapred.JobClient:     Superstep=5
12/04/02 13:47:29 INFO mapred.JobClient:     Last checkpointed superstep=4
12/04/02 13:47:29 INFO mapred.JobClient:     Current workers=18
12/04/02 13:47:29 INFO mapred.JobClient:     Current master task partition=0
12/04/02 13:47:29 INFO mapred.JobClient:     Sent messages=0
12/04/02 13:47:29 INFO mapred.JobClient:     Aggregate finished 
vertices=10430616
12/04/02 13:47:29 INFO mapred.JobClient:     Aggregate vertices=10430616

Reply via email to