[ 
https://issues.apache.org/jira/browse/HDFS-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107266#comment-17107266
 ] 

Jinglun commented on HDFS-15340:
--------------------------------

Hi [~linyiqun], thanks your nice comments !

 

A tricky problem when testing the 'the last stage is recovered successfully' is 
I can't catch the moment the job is recovered by the scheduler. After the job 
is recovered it will be scheduled immediately and the curProcedure, 
lastProcedure will change.

So I choose to do the test after the recovered job finishes.  The last stage is 
recovered successfully is equivalent to the recovered job continues with the 
last unfinished procedure. In the test I do all the verification after the 
recovered job is done.

The recoverProcedures is fetched from the recovered job. If the job continues 
from the recoverProcedure, then all the procedures before the recoverProcedure 
shouldn't be executed again. And the procedures after the 
recoverProcedure(including the recoverProcedure) should all be  executed.

The member WaitProcedure#executed is not serialized/deserialized, so after the 
job is recovered the WaitProcedure#executed will be false. Only the procedures 
that are executed after the recovery have the WaitProcedure#executed=true.

 
+      lastProcedure = procedureTable.get(currentProcedureName);  <-- should be 
lastProcedureName I think
Yes, your are right ! Thanks for pointing it out ! Fix it and upload v08.

> RBF: Implement BalanceProcedureScheduler basic framework
> --------------------------------------------------------
>
>                 Key: HDFS-15340
>                 URL: https://issues.apache.org/jira/browse/HDFS-15340
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Major
>         Attachments: HDFS-15340.001.patch, HDFS-15340.002.patch, 
> HDFS-15340.003.patch, HDFS-15340.004.patch, HDFS-15340.005.patch, 
> HDFS-15340.006.patch, HDFS-15340.007.patch, HDFS-15340.008.patch
>
>
> Patch in HDFS-15294 is too big to review so we split it into 2 patches. This 
> is the first one. Detail can be found at HDFS-15294.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to