[ 
https://issues.apache.org/jira/browse/HAMA-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suraj Menon updated HAMA-557:
-----------------------------

    Status: Patch Available  (was: Open)

Tested the patch for Rand Bench example in my 3 VM cluster. The following 
output shows the recovery in superstep 2, 3 and 4.

hama jar hama-examples-0.5.0-SNAPSHOT.jar  bench 5 5 5
12/08/01 06:28:56 DEBUG bsp.BSPJobClient: BSPJobClient.submitJobDir: 
hdfs://hadoop-1:54310/tmp/hama-hduser/bsp/system/submit_vjh6g
12/08/01 06:28:57 INFO bsp.BSPJobClient: Running job: job_201208010625_0002
12/08/01 06:29:00 INFO bsp.BSPJobClient: Current supersteps number: 0
12/08/01 06:29:45 INFO bsp.BSPJobClient: Current supersteps number: 1
12/08/01 06:29:48 INFO bsp.BSPJobClient: Current supersteps number: 2
12/08/01 06:29:54 INFO bsp.BSPJobClient: Current supersteps number: 0
12/08/01 06:30:09 INFO bsp.BSPJobClient: Current supersteps number: 3
12/08/01 06:30:15 INFO bsp.BSPJobClient: Current supersteps number: 0
12/08/01 06:30:30 INFO bsp.BSPJobClient: Current supersteps number: 4
12/08/01 06:30:42 INFO bsp.BSPJobClient: Current supersteps number: 0
12/08/01 06:30:57 INFO bsp.BSPJobClient: Current supersteps number: 5
12/08/01 06:30:57 INFO bsp.BSPJobClient: The total number of supersteps: 5
12/08/01 06:30:57 DEBUG bsp.Counters: Adding SUPERSTEPS
12/08/01 06:30:57 INFO bsp.BSPJobClient: Counters: 8
12/08/01 06:30:57 INFO bsp.BSPJobClient:   
org.apache.hama.bsp.JobInProgress$JobCounter
12/08/01 06:30:57 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=3
12/08/01 06:30:57 INFO bsp.BSPJobClient:   
org.apache.hama.bsp.BSPPeerImpl$PeerCounter
12/08/01 06:30:57 INFO bsp.BSPJobClient:     SUPERSTEPS=5
12/08/01 06:30:57 INFO bsp.BSPJobClient:     COMPRESSED_BYTES_SENT=718
12/08/01 06:30:57 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=15
12/08/01 06:30:57 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=712
12/08/01 06:30:57 INFO bsp.BSPJobClient:     COMPRESSED_BYTES_RECEIVED=474
12/08/01 06:30:57 INFO bsp.BSPJobClient:     TOTAL_MESSAGES_SENT=150
12/08/01 06:30:57 INFO bsp.BSPJobClient:     TOTAL_MESSAGES_RECEIVED=46

                
> Implement Checkpointing service in Hama
> ---------------------------------------
>
>                 Key: HAMA-557
>                 URL: https://issues.apache.org/jira/browse/HAMA-557
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core
>    Affects Versions: 0.6.0
>            Reporter: Suraj Menon
>            Assignee: Suraj Menon
>             Fix For: 0.6.0
>
>         Attachments: HAMA-505-557-610-611-v1.patch, 
> HAMA-557-ft-framework.patch
>
>
> Implement checkpointing service in Apache Hama. My patches for HAMA-533 and 
> HAMA-534 are blocked on this.
> - Checkpointing should be done as messages are either sent or received. I 
> prefer while receiving messages, as we can achieve some parallelism with 
> asynchronous messages. Please comment if you differ.
> - BSPMaster should hold the checkpoint status for each task. Checkpoint 
> status includes superstep count and file information for which checkpointing 
> is complete
> - MessageManager should notify Checkpointer of a new message at BSPPeer.
> - Implement/Reuse MessageBundle class as splitClass in BSPPeerImpl for 
> recovery in initInput.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to