----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17336/#review32958 -----------------------------------------------------------
This is super awesome. I have a few questions/comments. giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java <https://reviews.apache.org/r/17336/#comment62026> Should we make it true by default if there isn't much overhead? giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java <https://reviews.apache.org/r/17336/#comment62051> Maybe a better way would be time based, since the rest of the update logic is time-based? I.e. if the internal / 2 seconds have passed...update. giraph-core/src/main/java/org/apache/giraph/worker/WorkerProgress.java <https://reviews.apache.org/r/17336/#comment62031> Stores information about a worker's progress that is periodically written to ZooKeeper with WorkerProgressWriter. giraph-core/src/main/java/org/apache/giraph/worker/WorkerProgress.java <https://reviews.apache.org/r/17336/#comment62028> Please add the @ThreadSafe annotation giraph-core/src/main/java/org/apache/giraph/worker/WorkerProgress.java <https://reviews.apache.org/r/17336/#comment62027> this can be final giraph-core/src/main/java/org/apache/giraph/worker/WorkerProgressWriter.java <https://reviews.apache.org/r/17336/#comment62029> every 5 seconds seems quite often no? ZooKeeper is limited in its write throughtput. Maybe every 10 or 15 seconds is a bit better? - Avery Ching On Jan. 24, 2014, 10:56 p.m., Maja Kabiljo wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17336/ > ----------------------------------------------------------- > > (Updated Jan. 24, 2014, 10:56 p.m.) > > > Review request for giraph. > > > Bugs: GIRAPH-792 > https://issues.apache.org/jira/browse/GIRAPH-792 > > > Repository: giraph-git > > > Description > ------- > > Currently we print nothing about job progress to command line. We should > track which stage are we in and how far in it are we. > > > Diffs > ----- > > giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java 86823ed > giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java > c8b7d36 > giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java > 63f38df > giraph-core/src/main/java/org/apache/giraph/graph/ComputeCallable.java > 1fe1d10 > giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java > f31d99e > giraph-core/src/main/java/org/apache/giraph/job/CombinedWorkerProgress.java > PRE-CREATION > giraph-core/src/main/java/org/apache/giraph/job/GiraphJob.java 40670bb > giraph-core/src/main/java/org/apache/giraph/job/HaltApplicationUtils.java > 28b5781 > giraph-core/src/main/java/org/apache/giraph/job/JobProgressTracker.java > PRE-CREATION > giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java > 78487ef > giraph-core/src/main/java/org/apache/giraph/utils/CounterUtils.java > PRE-CREATION > giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java > bc29b03 > > giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java > 8ec0453 > > giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java > 01a6fc5 > giraph-core/src/main/java/org/apache/giraph/worker/WorkerProgress.java > PRE-CREATION > > giraph-core/src/main/java/org/apache/giraph/worker/WorkerProgressWriter.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/17336/diff/ > > > Testing > ------- > > mvn clean verify > > run on a cluster, checked there is no overhead (with 50 workers); sample > output: > 14/01/24 14:42:33 INFO job.JobProgressTracker: Data from 50 workers - Loading > data: 315250000 vertices loaded, 0 vertex input splits loaded; 0 edges > loaded, 0 edge input splits loaded > 14/01/24 14:42:42 INFO job.JobProgressTracker: Data from 50 workers - Loading > data: 441000000 vertices loaded, 64 vertex input splits loaded; 0 edges > loaded, 0 edge input splits loaded > 14/01/24 14:42:51 INFO job.JobProgressTracker: Data from 50 workers - Loading > data: 494250000 vertices loaded, 234 vertex input splits loaded; 0 edges > loaded, 0 edge input splits loaded > 14/01/24 14:43:00 INFO job.JobProgressTracker: Data from 50 workers - Loading > data: 498750000 vertices loaded, 247 vertex input splits loaded; 0 edges > loaded, 0 edge input splits loaded > 14/01/24 14:43:09 INFO job.JobProgressTracker: Data from 50 workers - Loading > data: 499500000 vertices loaded, 249 vertex input splits loaded; 0 edges > loaded, 0 edge input splits loaded > 14/01/24 14:43:18 INFO job.JobProgressTracker: Data from 47 workers - Compute > superstep 0: 6800000 out of 470000000 vertices computed; 0 out of 2350 > partitions computed > 14/01/24 14:43:27 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 0: 133200000 out of 500000000 vertices computed; 332 out of 2500 > partitions computed > 14/01/24 14:43:36 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 0: 304500000 out of 500000000 vertices computed; 1080 out of 2500 > partitions computed > 14/01/24 14:43:46 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 0: 467300000 out of 500000000 vertices computed; 2203 out of 2500 > partitions computed > 14/01/24 14:43:54 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 0: 500000000 out of 500000000 vertices computed; 2500 out of 2500 > partitions computed > 14/01/24 14:44:04 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 0: 500000000 out of 500000000 vertices computed; 2500 out of 2500 > partitions computed > 14/01/24 14:44:13 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 0: 500000000 out of 500000000 vertices computed; 2500 out of 2500 > partitions computed > 14/01/24 14:44:13 INFO mapred.ExpireTasks: Starting launching task sweep > 14/01/24 14:44:21 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 0: 500000000 out of 500000000 vertices computed; 2500 out of 2500 > partitions computed > 14/01/24 14:44:30 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 0: 500000000 out of 500000000 vertices computed; 2500 out of 2500 > partitions computed > 14/01/24 14:44:39 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 0: 500000000 out of 500000000 vertices computed; 2500 out of 2500 > partitions computed > 14/01/24 14:44:48 INFO job.JobProgressTracker: Data from 13 workers - Compute > superstep 1: 0 out of 130000000 vertices computed; 0 out of 650 partitions > computed > 14/01/24 14:44:57 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 1: 112600000 out of 500000000 vertices computed; 159 out of 2500 > partitions computed > 14/01/24 14:45:06 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 1: 268600000 out of 500000000 vertices computed; 1003 out of 2500 > partitions computed > 14/01/24 14:45:15 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 1: 418600000 out of 500000000 vertices computed; 2002 out of 2500 > partitions computed > 14/01/24 14:45:24 INFO job.JobProgressTracker: Data from 50 workers - Compute > superstep 1: 499400000 out of 500000000 vertices computed; 2494 out of 2500 > partitions computed > > > Thanks, > > Maja Kabiljo > >
