Re: GSoC-2015: Clustering [ODE-563]

sudharma subasinghe Fri, 27 Mar 2015 01:40:27 -0700

Hi Sathwik,

Thank you for your feedback. I agree with you about the single point of
failure. I think in the Hazelcast cluster when the database goes down,
writes in the cache are queued in a log file so the writes can be persisted
in the database once it is backed up. Am I right?


Thank you.

On 27 March 2015 at 13:27, Sathwik B P <[email protected]> wrote:

> Hi Sudharma,
>
> It's a good proposal :)
>
> I have one piece information to provide you. As you must be aware that ODE
> stores it's deployment artifacts on a file system.
> In a cluster environment this file system should either be accessible to
> all the nodes or we need a distributed/replicated file system.
>
> Though i would recommend using a replicated file system more than a shared
> file system. A shared file system would either be on a separate box on the
> cluster and its single point of failure.
> By using a replicated file system store we can provide maximum cluster
> availability.
>
> What do you think Tammo.
>
> regards,
> sathwik
>
> On Fri, Mar 27, 2015 at 10:16 AM, sudharma subasinghe <
> [email protected]
> > wrote:
>
> > Hi Tammo,
> >
> > Thank you for the feedback. I'll complete the proposal as your comment.
> >
> > Thank you.
> >
> > On 27 March 2015 at 04:45, Tammo van Lessen <[email protected]>
> wrote:
> >
> > > Hi Sudharma,
> > >
> > > very good proposal. A minor comment on the paragraph about the
> conflicts
> > > during deployment: There is already a marker file for deployed
> processes,
> > > so it wont happen that a node tries to deploy a process that is already
> > > known to the database, even it has changed. The race condition that
> needs
> > > to be avoided by the clustering implementation is that two node take up
> > > newly added processes at the very same time.
> > >
> > > Could you please add a short paragraph about your availabilty and how
> > much
> > > time you can commit for GSOC this summer? I'd love to also see a
> > > deliverable that allows us to easily test ODE in a clustered setup,
> e.g.
> > > using docker-compose. Would that fit into the "Testing and develop"
> time
> > > slot?
> > >
> > > Thanks,
> > >   Tammo
> > >
> > > On Thu, Mar 26, 2015 at 5:26 PM, sudharma subasinghe <
> > > [email protected]>
> > > wrote:
> > >
> > > > Hi Tammo,
> > > >
> > > > I drafted the proposal.This is the link for Google doc. It would be
> > great
> > > > if you can give a feedback on this.
> > > >
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1H7cLekwUr2juNX2DFzgqq5FEkHZPDtFkHKz0aLWiB2k/edit?usp=sharing
> > > >
> > > > Thank you.
> > > >
> > > > On 26 March 2015 at 21:35, Tammo van Lessen <[email protected]>
> > > wrote:
> > > >
> > > > > Hi Sudharma,
> > > > >
> > > > > yes. Regarding 3) it is in particular the isolation of process
> > > instances.
> > > > > There must be a load balancer in front of ODE, and the lock is to
> > avoid
> > > > the
> > > > > case where node one is processing a process instance and node two
> > > > receives
> > > > > a message for the same process instance and starts processing as
> > well.
> > > > >
> > > > > Looking forward to your proposal.
> > > > >
> > > > > Thanks,
> > > > >   Tammo
> > > > >
> > > > > On Thu, Mar 26, 2015 at 4:52 PM, sudharma subasinghe <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi Tammo,
> > > > > >
> > > > > > Thank you for reply. I went through the thread in jira which is
> > > > referring
> > > > > > this issue. I extracted few ideas from there. As I think
> > > implementation
> > > > > > should contain following points.
> > > > > >
> > > > > > 1) Support cluster awareness in deploying phase
> > > > > > 2) Improve the ODE's scheduler
> > > > > > 3) Implement a distributed lock to avoid concurrent modification
> in
> > > > > cluster
> > > > > >
> > > > > > I am drafting a proposal including those points. I'll send it for
> > > your
> > > > > > review soon.
> > > > > >
> > > > > > Thank you.
> > > > > >
> > > > > > On 26 March 2015 at 18:49, Tammo van Lessen <
> [email protected]>
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > ODE is originally designed to be run in a clustered fashion,
> > > however
> > > > it
> > > > > > has
> > > > > > > never been implemented in ODE. The goal would be to integrate a
> > > > > > clustering
> > > > > > > framework like Hazelcast in order to add this functionality.
> > > > > > >
> > > > > > > The main integration points are the ODE scheduler and the
> process
> > > > > store.
> > > > > > > The scheduler is already capable to handle several nodes but
> > needs
> > > > the
> > > > > > > integration to know if cluster nodes are still present. The API
> > > > > currently
> > > > > > > anticipates a heart beat model, with Hazelcast this might need
> to
> > > be
> > > > > > > changed or adapted. The other part is the process store, which
> > > > > implements
> > > > > > > the (hot-)deployment that is filesystem based. Under the
> > assumption
> > > > > that
> > > > > > a
> > > > > > > distributed filesystem is used, the cluster implementation
> needs
> > to
> > > > > take
> > > > > > > care that only one single node (the master) is taking care of
> new
> > > > > > > deployments, just in order to avoid multiple nodes doing the
> same
> > > > thing
> > > > > > in
> > > > > > > parallel. Then there is also one lock that needs to be
> > distributed,
> > > > > > either
> > > > > > > using database locks or a distributed lock (e.g. from
> hazelcast).
> > > > > > >
> > > > > > > Addtional requirements would be the integration with our config
> > > file
> > > > so
> > > > > > > that a cluster (and its nodes) can be configured as well as
> some
> > > > basic
> > > > > > > monitoring. Also a basic test environment, e.g. based on Docker
> > > would
> > > > > be
> > > > > > > very good to verify the approach.
> > > > > > >
> > > > > > > So I guess the steps would be: 1. Research to find a suitable
> > > cluster
> > > > > > > framework (I think Hazelcast would be a good fit) and getting
> > > > familiar
> > > > > > with
> > > > > > > ODE and this framework. 2. Identify the integration points in
> > ODE.
> > > 3.
> > > > > > Based
> > > > > > > on the chosen framework, develop approaches to serve these
> > > > integration
> > > > > > > points (We need leader election for the store, a distributed
> lock
> > > for
> > > > > the
> > > > > > > runtime and the information whether nodes are joining or
> leaving
> > > the
> > > > > > > cluster to be able to reschedule tasks from lost nodes) along
> > with
> > > a
> > > > > > > distributed setup to test. 4. Develop and test, 5. Test.
> > > > > > >
> > > > > > > For questions regarding the integration points please feel free
> > to
> > > > ask
> > > > > > > here, I can give you some pointers.
> > > > > > >
> > > > > > > HTH,
> > > > > > >   Tammo
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 24, 2015 at 5:03 AM, sudharma subasinghe <
> > > > > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I am interested in this project as I have enough basic
> > knowledge
> > > > > about
> > > > > > > > apache axis2, apache ODE, WS-BPEL and I am currently studying
> > > > those.
> > > > > > So I
> > > > > > > > appreciate if you can provide more details on project.
> > > > > > > > Thank you
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Tammo van Lessen - http://www.taval.de
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Tammo van Lessen - http://www.taval.de
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Tammo van Lessen - http://www.taval.de
> > >
> >
>

Re: GSoC-2015: Clustering [ODE-563]

Reply via email to