On 1 May 2012, at 15:45, Avery Ching wrote:
> I wonder if the issues you are seeing are related to
> This shouldn't happen.
Good to know that that should not happen.
For my specific algorithm it happens all the time.
For small amounts of processing the job finishes 2 minutes after the mappers
report a 100%.
For larger amounts it can take 20 minutes or so. So there is definitively a
connection between the expected length of processing the job,
and the amount of time which passes after the mappers report 100%.
I even had a pretty extreme case where most of the workers where restarted
after an hour, and I killed the job after 90 minutes.
In addition, the "100% map" always comes about 14-15 minutes after starting the
job, independent of the total processing time.
That might be due to the time it takes to read in the data, which is always
around 11 minutes for the "vertex input superstep".
(The data (and its size) which my job reads in order to construct the graph is
always the same. Only the "configuration" of the algorithm changes.
In my case, the configuration consists of the set of start nodes, and the
association between different start nodes and user ids).
Should I attach a zip file of the log directory for the job which restarted
most of its workers after an hour ?
I can attach that to the JIRA issue.