From the jira log it shows that the committed patch lets bsp peer directly 
report status back to master. An issue we may need to consider right now is 
`how can we determine if a groom server fails?' With original mechanism we can 
allow groom server to manage tasks (bsp peer) and master takes care of groom 
servers. For instance, if a groom server fails, a master can reschedule all 
tasks specified on that groom server to other working one. With currently 
mechanism, the master, in addition to monitor the activity of groom servers, 
also needs to deal with bsp peer. Do we have some plans on this already? 

-----Original message-----
From:Edward J. Yoon <[email protected]>
To:[email protected] <[email protected]>
Date:Fri, 26 Aug 2011 15:11:56 +0900
Subject:Re: Summary of problems with HAMA-413 and Discussion

Okay.

Sent from my iPhone

On 2011. 8. 26., at 오후 2:49, "ChiaHung Lin" <[email protected]> wrote:

> The latest patch (HAMA_NEW.patch) for HAMA-413 seems still using bsp peer to 
> report its status back to master. 
> 
> +        umbilical.updateTaskStatusAndReport(taskid);
> 
> +  public void updateTaskStatusAndReport(TaskAttemptID taskid) {
> ...
> +    doReport(taskStatus);
> +  }
> 
> Is there any chance to revert back using a version that reports task status 
> by GroomServer, so we can discuss based on that version? Just to ensure that 
> the following issues are not the result derived from the code changed above. 
> 
> -----Original message-----
> From:Edward J. Yoon <[email protected]>
> To:[email protected]
> Date:Thu, 25 Aug 2011 19:43:48 +0900
> Subject:Summary of problems with HAMA-413 and Discussion
> 
> Today, I tested all Hama examples on my cluster of 32 nodes, with 96
> tasks. Pi and Serialized Printing examples were working fine but
> 
> 1. Barrier Synchronizations are not working well (with a 'bench' example).
> 2. When an unexpected shutdown occurs, ZK nodes (which created by each
> BSPPeer) will not be deleted. There's no way to clean them up before
> reboot the server.
> 3. Graph examples are not working.
> 4. Too many reporting times between Groom and Master.
> 5. And, there are many code issues that can be improved.
> 
> 1, and 2 issues are already reported (See HAMA-387, HAMA-407). Some of
> 3, 4, and 5 issues are already started by ChiaHung Lin.
> 
> All issues around this should be fixed in HAMA-413? or, Should we just
> commit HAMA-413?
> 
> Thanks.
> -- 
> Best Regards, Edward J. Yoon
> @eddieyoon
> 
> 
> --
> ChiaHung Lin
> Department of Information Management
> National University of Kaohsiung
> Taiwan


--
ChiaHung Lin
Department of Information Management
National University of Kaohsiung
Taiwan

Reply via email to