[ 
https://issues.apache.org/jira/browse/GIRAPH-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837594#comment-13837594
 ] 

Rob Vesse commented on GIRAPH-808:
----------------------------------

The middle segment of progress could be better approximated using the number of 
vertices to be processed i.e.

CurrentSuperstepProgress = VerticesProcessed / TotalVertices
SuperstepsProgress = (CurrentSuperstep + 1 / SuperstepsSoFar)
ComputationProgress = CurrentSuperstepProgress * (SuperstepsProgress / (N - 1))

This would have the effect that the progress would not be a linear trend since 
it would trend towards N during a super step and then drop back down at the 
start of the next super step but providing Hadoop allows progress to change in 
this way it would be a much better way of reporting the progression of a Giraph 
computation.


> Giraph should report progress more accurately when running on Map/Reduce
> ------------------------------------------------------------------------
>
>                 Key: GIRAPH-808
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-808
>             Project: Giraph
>          Issue Type: Improvement
>    Affects Versions: 1.0.0, 1.1.0
>            Reporter: Rob Vesse
>
> The current way that Giraph reports progress when running on Map/Reduce seems 
> rather flawed.  When running a Giraph program the map tasks are launched and 
> after initialisation their progress almost immediately goes to 100% and stays 
> there throughout.  So the only way to monitor progress is to 
> I appreciate that there is no way for Giraph to report accurate progress 
> since it does not know in advance how many super steps there will be but it 
> could report progress in a more useful way.
> For example:
> - First N percent of progress is the input phase, this part could likely be 
> accurately calculated by using standard Hadoop input APIs which Giraph input 
> is built on
> - Next N percent of progress is an estimation that trends towards the final 
> value but does not reach it until the computation has halted i.e. (Superstep 
> + 1) / (N - 1) so this will naturally trend towards N.  Once the computation 
> halts then the value becomes N
> - Last N percentage of progress is the output phase, again this part could 
> likely be accurately calculated easily since Giraph knows how many items it 
> has to output
> What does anybody else think?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to