[
https://issues.apache.org/jira/browse/GIRAPH-810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865831#comment-13865831
]
Maja Kabiljo commented on GIRAPH-810:
-------------------------------------
Nice addition, +1, committed!
> Giraph should track aggregate statistics over lifetime of the computation
> -------------------------------------------------------------------------
>
> Key: GIRAPH-810
> URL: https://issues.apache.org/jira/browse/GIRAPH-810
> Project: Giraph
> Issue Type: Improvement
> Affects Versions: 1.1.0
> Reporter: Rob Vesse
> Fix For: 1.1.0
>
> Attachments: GIRAPH-810.patch
>
>
> When Giraph completes a job it reports a set of information about the job
> like so:
> {noformat}
> Giraph Timers
> 2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):
> Superstep 3 TriangleFindingComputation (ms)=102234
> 2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):
> Superstep 2 TriangleFindingComputation (ms)=29419
> 2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):
> Superstep 1 TriangleFindingComputation (ms)=34397
> 2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):
> Input superstep (ms)=12642
> 2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):
> Total (ms)=208962
> 2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):
> Superstep 0 TriangleFindingComputation (ms)=4201
> 2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):
> Shutdown (ms)=2698
> 2013-12-04 10:43:45,570 INFO org.apache.hadoop.mapred.JobClient (main):
> Setup (ms)=23351
> 2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):
> Zookeeper server:port
> 2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):
> ip-10-145-221-220.ec2.internal:22181=0
> 2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):
> Giraph Stats
> 2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):
> Aggregate edges=150000
> 2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):
> Sent message bytes=0
> 2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):
> Superstep=4
> 2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):
> Last checkpointed superstep=0
> 2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):
> Current workers=16
> 2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):
> Current master task partition=0
> 2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):
> Sent messages=0
> 2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):
> Aggregate finished vertices=1000
> 2013-12-04 10:43:45,571 INFO org.apache.hadoop.mapred.JobClient (main):
> Aggregate vertices=1000
> {noformat}
> The problem is that some of this statistics are not particularly helpful
> since they pertain only to the most recent super step, namely Sent messages
> and Sent messages bytes.
> I can understand that there is a reason for doing this since the number of
> sent messages is used in helping to determine whether a computation should
> halt at a given super step but it would be useful if these were also tracked
> in aggregate over the lifetime of the computation.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)