[ 
https://issues.apache.org/jira/browse/HAMA-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13090828#comment-13090828
 ] 

ChiaHung Lin edited comment on HAMA-413 at 8/25/11 7:01 AM:
------------------------------------------------------------

Below is what I observe. 

GroomServer periodically checks if TaskRunner is not running 
(!tip.runner.isAlive()), then it sets the phase to cleanup and reports back to 
BSPMaster. However, within TaskRunner's run(), its execution may immediately 
finish if it simply launches another thread along with spawning another child 
process (i.e. BSPPeer); for example, in the patch HAMA-398 TaskRunner.run()

{code}
public void run() {
...
BspChildRunner bspPeer = new BspChildRunner(bspArgs, workDir); // spawn bsp 
peer child process
bspPeer.start();
... // after start(), it immediate returns so within offerService() taskStatus 
will be set to cleanup because runner.isAlive() is false
    // but the writing data to hdfs perhaps is not yet finished.
}
{code}

In the HAMA-398 v1 patch, assert with join, which in turns makes use of 
Future.get() would ideally have the same effect as original procedure with 
waitFor().

{code}
public void run() {
...
BspChildRunner bspPeer = new BspChildRunner(bspArgs, workDir); // spawn bsp 
peer child process
bspPeer.start();
bspPeer.join(); // wait for bsppeer finishes its execution, including writing 
data to hdfs.  
...
}
{code}

      was (Author: chl501):
    Below is what I observe. 

GroomServer periodically checks if TaskRunner is not running 
(!tip.runner.isAlive()), then it sets the phase to cleanup and reports back to 
BSPMaster. However, within TaskRunner's run(), its execution may immediately 
finish if it simply launches another thread along with spawning another child 
process (i.e. BSPPeer); for example, in the patch HAMA-398 TaskRunner.run()

public void run() {
...
BspChildRunner bspPeer = new BspChildRunner(bspArgs, workDir); // spawn bsp 
peer child process
bspPeer.start();
... // after start(), it immediate returns so within offerService() taskStatus 
will be set to cleanup because runner.isAlive() is false
    // but the writing data to hdfs perhaps is not yet finished.
}

In the HAMA-398 v1 patch, assert with join, which in turns makes use of 
Future.get() would ideally have the same effect as original procedure with 
waitFor().

public void run() {
...
BspChildRunner bspPeer = new BspChildRunner(bspArgs, workDir); // spawn bsp 
peer child process
bspPeer.start();
bspPeer.join(); // wait for bsppeer finishes its execution, including writing 
data to hdfs.  
...
}

  
> Remove limitation on the number of tasks
> ----------------------------------------
>
>                 Key: HAMA-413
>                 URL: https://issues.apache.org/jira/browse/HAMA-413
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.4.0
>
>         Attachments: HAMA-413_v01.patch, HAMA-413_v02.patch, 
> HAMA-413_v03.patch, HAMA-413_v05.patch, HAMA_413_v04.patch
>
>
> By HAMA-410 patch, BSPPeer object will be constructed at child process. Now 
> we can just remove limitation on the number of tasks.
> Here's TODO list:
> 1. The number of tasks per groom should be configurable e.g., 
> 'bsp.local.tasks.maximum'.
> 2. The 'totalTaskCapacity' should be calculated at 
> BSPMaster.getClusterStatus().
> 3. When scheduling tasks, consider how to allocate them.
> 4. Each BSPPeer should know all created peers of Hama cluster by job. It can 
> be listed based on actions of GroomServer.
> 5. In examples, 'cluster.getGroomServers()' can be changed to 
> 'cluster.getMaxTasks()'.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to