Re: GSoC-2015: Clustering [ODE-563]

Sathwik B P Fri, 27 Mar 2015 02:07:44 -0700

Hi Sudharma,

This is not about storing in the database.


When a deployment is initiated to ODE, the process artifacts are first
stored in the file system (under WEB-INF/processes folder is the default
configuration). Then a deployment poller scans for new files on the file
store and invokes the bpel compiler to create the java object model from
the bpel file (we refer it as OModel) with a file name ending with *.cbp*
on the file system. Based on this object model, data is populated in the
database. Remember that the whole model is not stored in the database. So
on a restart of the machine hosting ODE, the deployment poller will check
for this *.CBP* file and load into memory. As a process instance is created
the activities is taken from this model and fed into the JaCOB which is
ODE's VPU.

So I am basically talking about this file system where the process
artifacts will be stored.

Today this resides locally on the machine where ODE war is deployed. Think
about in a cluster.  Where would you suggest to have this file system.

regards,
sathwik

On Fri, Mar 27, 2015 at 2:08 PM, sudharma subasinghe <[email protected]>
wrote:

> Hi Sathwik,
>
> Thank you for your feedback. I agree with you about the single point of
> failure. I think in the Hazelcast cluster when the database goes down,
> writes in the cache are queued in a log file so the writes can be persisted
> in the database once it is backed up. Am I right?
>
> Thank you.
>
> On 27 March 2015 at 13:27, Sathwik B P <[email protected]> wrote:
>
> > Hi Sudharma,
> >
> > It's a good proposal :)
> >
> > I have one piece information to provide you. As you must be aware that
> ODE
> > stores it's deployment artifacts on a file system.
> > In a cluster environment this file system should either be accessible to
> > all the nodes or we need a distributed/replicated file system.
> >
> > Though i would recommend using a replicated file system more than a
> shared
> > file system. A shared file system would either be on a separate box on
> the
> > cluster and its single point of failure.
> > By using a replicated file system store we can provide maximum cluster
> > availability.
> >
> > What do you think Tammo.
> >
> > regards,
> > sathwik
> >
> > On Fri, Mar 27, 2015 at 10:16 AM, sudharma subasinghe <
> > [email protected]
> > > wrote:
> >
> > > Hi Tammo,
> > >
> > > Thank you for the feedback. I'll complete the proposal as your comment.
> > >
> > > Thank you.
> > >
> > > On 27 March 2015 at 04:45, Tammo van Lessen <[email protected]>
> > wrote:
> > >
> > > > Hi Sudharma,
> > > >
> > > > very good proposal. A minor comment on the paragraph about the
> > conflicts
> > > > during deployment: There is already a marker file for deployed
> > processes,
> > > > so it wont happen that a node tries to deploy a process that is
> already
> > > > known to the database, even it has changed. The race condition that
> > needs
> > > > to be avoided by the clustering implementation is that two node take
> up
> > > > newly added processes at the very same time.
> > > >
> > > > Could you please add a short paragraph about your availabilty and how
> > > much
> > > > time you can commit for GSOC this summer? I'd love to also see a
> > > > deliverable that allows us to easily test ODE in a clustered setup,
> > e.g.
> > > > using docker-compose. Would that fit into the "Testing and develop"
> > time
> > > > slot?
> > > >
> > > > Thanks,
> > > >   Tammo
> > > >
> > > > On Thu, Mar 26, 2015 at 5:26 PM, sudharma subasinghe <
> > > > [email protected]>
> > > > wrote:
> > > >
> > > > > Hi Tammo,
> > > > >
> > > > > I drafted the proposal.This is the link for Google doc. It would be
> > > great
> > > > > if you can give a feedback on this.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1H7cLekwUr2juNX2DFzgqq5FEkHZPDtFkHKz0aLWiB2k/edit?usp=sharing
> > > > >
> > > > > Thank you.
> > > > >
> > > > > On 26 March 2015 at 21:35, Tammo van Lessen <[email protected]>
> > > > wrote:
> > > > >
> > > > > > Hi Sudharma,
> > > > > >
> > > > > > yes. Regarding 3) it is in particular the isolation of process
> > > > instances.
> > > > > > There must be a load balancer in front of ODE, and the lock is to
> > > avoid
> > > > > the
> > > > > > case where node one is processing a process instance and node two
> > > > > receives
> > > > > > a message for the same process instance and starts processing as
> > > well.
> > > > > >
> > > > > > Looking forward to your proposal.
> > > > > >
> > > > > > Thanks,
> > > > > >   Tammo
> > > > > >
> > > > > > On Thu, Mar 26, 2015 at 4:52 PM, sudharma subasinghe <
> > > > > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Tammo,
> > > > > > >
> > > > > > > Thank you for reply. I went through the thread in jira which is
> > > > > referring
> > > > > > > this issue. I extracted few ideas from there. As I think
> > > > implementation
> > > > > > > should contain following points.
> > > > > > >
> > > > > > > 1) Support cluster awareness in deploying phase
> > > > > > > 2) Improve the ODE's scheduler
> > > > > > > 3) Implement a distributed lock to avoid concurrent
> modification
> > in
> > > > > > cluster
> > > > > > >
> > > > > > > I am drafting a proposal including those points. I'll send it
> for
> > > > your
> > > > > > > review soon.
> > > > > > >
> > > > > > > Thank you.
> > > > > > >
> > > > > > > On 26 March 2015 at 18:49, Tammo van Lessen <
> > [email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > ODE is originally designed to be run in a clustered fashion,
> > > > however
> > > > > it
> > > > > > > has
> > > > > > > > never been implemented in ODE. The goal would be to
> integrate a
> > > > > > > clustering
> > > > > > > > framework like Hazelcast in order to add this functionality.
> > > > > > > >
> > > > > > > > The main integration points are the ODE scheduler and the
> > process
> > > > > > store.
> > > > > > > > The scheduler is already capable to handle several nodes but
> > > needs
> > > > > the
> > > > > > > > integration to know if cluster nodes are still present. The
> API
> > > > > > currently
> > > > > > > > anticipates a heart beat model, with Hazelcast this might
> need
> > to
> > > > be
> > > > > > > > changed or adapted. The other part is the process store,
> which
> > > > > > implements
> > > > > > > > the (hot-)deployment that is filesystem based. Under the
> > > assumption
> > > > > > that
> > > > > > > a
> > > > > > > > distributed filesystem is used, the cluster implementation
> > needs
> > > to
> > > > > > take
> > > > > > > > care that only one single node (the master) is taking care of
> > new
> > > > > > > > deployments, just in order to avoid multiple nodes doing the
> > same
> > > > > thing
> > > > > > > in
> > > > > > > > parallel. Then there is also one lock that needs to be
> > > distributed,
> > > > > > > either
> > > > > > > > using database locks or a distributed lock (e.g. from
> > hazelcast).
> > > > > > > >
> > > > > > > > Addtional requirements would be the integration with our
> config
> > > > file
> > > > > so
> > > > > > > > that a cluster (and its nodes) can be configured as well as
> > some
> > > > > basic
> > > > > > > > monitoring. Also a basic test environment, e.g. based on
> Docker
> > > > would
> > > > > > be
> > > > > > > > very good to verify the approach.
> > > > > > > >
> > > > > > > > So I guess the steps would be: 1. Research to find a suitable
> > > > cluster
> > > > > > > > framework (I think Hazelcast would be a good fit) and getting
> > > > > familiar
> > > > > > > with
> > > > > > > > ODE and this framework. 2. Identify the integration points in
> > > ODE.
> > > > 3.
> > > > > > > Based
> > > > > > > > on the chosen framework, develop approaches to serve these
> > > > > integration
> > > > > > > > points (We need leader election for the store, a distributed
> > lock
> > > > for
> > > > > > the
> > > > > > > > runtime and the information whether nodes are joining or
> > leaving
> > > > the
> > > > > > > > cluster to be able to reschedule tasks from lost nodes) along
> > > with
> > > > a
> > > > > > > > distributed setup to test. 4. Develop and test, 5. Test.
> > > > > > > >
> > > > > > > > For questions regarding the integration points please feel
> free
> > > to
> > > > > ask
> > > > > > > > here, I can give you some pointers.
> > > > > > > >
> > > > > > > > HTH,
> > > > > > > >   Tammo
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 24, 2015 at 5:03 AM, sudharma subasinghe <
> > > > > > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I am interested in this project as I have enough basic
> > > knowledge
> > > > > > about
> > > > > > > > > apache axis2, apache ODE, WS-BPEL and I am currently
> studying
> > > > > those.
> > > > > > > So I
> > > > > > > > > appreciate if you can provide more details on project.
> > > > > > > > > Thank you
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Tammo van Lessen - http://www.taval.de
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Tammo van Lessen - http://www.taval.de
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Tammo van Lessen - http://www.taval.de
> > > >
> > >
> >
>

Re: GSoC-2015: Clustering [ODE-563]

Reply via email to