Thanks Edward, I looked at the code and it looks like its nicely abstracted. I see some comments in the code that say this happens only in YARN. Can you give me some additional info on what is the difference when running with YARN.
Another thing I wanted to check is what happens when a node fails, is the entire job restarted or just super step or just the sub task of the super step. I am interested in the current behavior and what would be nice to have. Is there a document that describes the internal architecture. Thanks, Kishore G On Wed, May 8, 2013 at 6:21 PM, Edward J. Yoon <[email protected]>wrote: > Hi, > > This would be great collaboration. Since we pursue the pluggable > interfaces for managing the synchronization[1], messenger, and job > scheduling systems (we want to preserve the classic (standalone) > cluster mode, while integrating with resource manager systems), the > integration with Helix won't be difficult. > > 1. http://wiki.apache.org/hama/SyncService > > On Thu, May 9, 2013 at 7:01 AM, kishore g <[email protected]> wrote: > > Hello, > > > > I am starting a discussion thread on potential pros/cons of using Helix > in > > Hama. I dont know the internal details of Hama, so please correct me if > > something does not make sense. > > > > My source of information is http://wiki.apache.org/hama/Architectureand a > > brief chat with Suraj at ApacheCon where he described the need for > barriers > > between super steps. > > > > Please read about Apache Helix here http://helix.incubator.apache.org/. > > > > Architecture wise Helix maps pretty well with the components in Hama. > > HelixController can be wrapped inside BSPMaster and GroomServer is the > > PARTICIPANT in Helix terminology that wraps Helix Agent. > > > > The partitioning and assigning tasks to GroomServers can be done via > Helix > > Apis, it basically boils down to setting the idealstate for a particular > > stage. Starting of the next step which basically depends on all tasks in > > previous step being completed can be done by watching the ExternalView. > > > > In the architecture wiki, I see that there is plan to integrate with > > Zookeeper for fault tolerance. Helix internally uses Zookeeper to store > the > > cluster state. So it might make it easier to make the tasks fault > tolerant > > and probably restartable as well at a task level instead of job/stage > level. > > > > We recently added a recipe in Helix to demonstrate the concept of > > dependency between resources. > > > > http://helix.incubator.apache.org/recipes/task_dag_execution.html > > Code: > > > https://github.com/apache/incubator-helix/tree/master/recipes/task-execution/src/main/java/org/apache/helix/taskexecution > > > > Let me know your thoughts. > > > > thanks, > > Kishore G > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon >
