Re: GSoC-2015: Clustering [ODE-563]

sudharma subasinghe Fri, 27 Mar 2015 06:54:00 -0700

Hi Sathwik,

In the shared database, there is table which contains md5sum and package
names related to deploying packages. When it comes to deploy a  new package
to the master node it calculates the md5sum value and stores the package
name within new version number. If the package is a new version of a
deployed one it checks the table for md5sum value. As it doesn't match with
existing values it creates a new version in the table.


When it comes to deploy a package to the slave node, it checks the table
for md5sum value and if there's matching then it is read from the table. If
there is not a matching, give a warning and ignore it.

Using the table each node can read the data in the table as described
above. as Each node have a separate WEB-INF/processes folder it can be used
as the file system. So at run time node can read data and can load the
process model which is located in the local file system.

Can you reply me if there is anything wrong.

Thank you.

On 27 March 2015 at 18:41, Sathwik B P <[email protected]> wrote:

> Hi Sudharma,
>
> Sorry, I didn't understand your solution.
>
> Let me try to understand with few questions.
>
> Assume there are 3 node cluster. One is Leader and other is 2 Slaves. There
> is an LB.
>
> The payload of the deploy operation in Deploy Service is a Base64 encoded
> contents of the process artifacts (HelloWorld.zip)
> Assuming the request lands on the Leader, the archive has to be unpacked on
> the file system.
>
> What is this file system going to be?
> Slaves needs access to this file system where the process artifacts are
> unpacked. When the Leader is unavailable and one of the Slave is elected as
> a Leader, how would it get the process model to load and execute?
>
> regards,
> sathwik
>
> On Fri, Mar 27, 2015 at 5:12 PM, sudharma subasinghe <
> [email protected]>
> wrote:
>
> > Hi Sathwik,
> >
> > The database is already shared. For avoiding the race condition in
> > deploying, I use the master-slave configuration. There I introduce a new
> > table contains md5sum and package name in the database. I think it's not
> > need to use a replicated file system as each node can deploy the package
> at
> > run time observing that table.
> >
> > Thank you.
> >
> > On 27 March 2015 at 15:13, sudharma subasinghe <[email protected]>
> > wrote:
> >
> > > Hi Sathwik,
> > >
> > > Now I got the idea. Thanks for the information. I'll consider the
> point.
> > >
> > > Thank you.
> > >
> > > On 27 March 2015 at 14:34, Sathwik B P <[email protected]> wrote:
> > >
> > >> Hi Sudharma,
> > >>
> > >> This is not about storing in the database.
> > >>
> > >> When a deployment is initiated to ODE, the process artifacts are first
> > >> stored in the file system (under WEB-INF/processes folder is the
> default
> > >> configuration). Then a deployment poller scans for new files on the
> file
> > >> store and invokes the bpel compiler to create the java object model
> from
> > >> the bpel file (we refer it as OModel) with a file name ending with
> > *.cbp*
> > >> on the file system. Based on this object model, data is populated in
> the
> > >> database. Remember that the whole model is not stored in the database.
> > So
> > >> on a restart of the machine hosting ODE, the deployment poller will
> > check
> > >> for this *.CBP* file and load into memory. As a process instance is
> > >> created
> > >> the activities is taken from this model and fed into the JaCOB which
> is
> > >> ODE's VPU.
> > >>
> > >> So I am basically talking about this file system where the process
> > >> artifacts will be stored.
> > >>
> > >> Today this resides locally on the machine where ODE war is deployed.
> > Think
> > >> about in a cluster.  Where would you suggest to have this file system.
> > >>
> > >> regards,
> > >> sathwik
> > >>
> > >> On Fri, Mar 27, 2015 at 2:08 PM, sudharma subasinghe <
> > >> [email protected]>
> > >> wrote:
> > >>
> > >> > Hi Sathwik,
> > >> >
> > >> > Thank you for your feedback. I agree with you about the single point
> > of
> > >> > failure. I think in the Hazelcast cluster when the database goes
> down,
> > >> > writes in the cache are queued in a log file so the writes can be
> > >> persisted
> > >> > in the database once it is backed up. Am I right?
> > >> >
> > >> > Thank you.
> > >> >
> > >> > On 27 March 2015 at 13:27, Sathwik B P <[email protected]>
> wrote:
> > >> >
> > >> > > Hi Sudharma,
> > >> > >
> > >> > > It's a good proposal :)
> > >> > >
> > >> > > I have one piece information to provide you. As you must be aware
> > that
> > >> > ODE
> > >> > > stores it's deployment artifacts on a file system.
> > >> > > In a cluster environment this file system should either be
> > accessible
> > >> to
> > >> > > all the nodes or we need a distributed/replicated file system.
> > >> > >
> > >> > > Though i would recommend using a replicated file system more than
> a
> > >> > shared
> > >> > > file system. A shared file system would either be on a separate
> box
> > on
> > >> > the
> > >> > > cluster and its single point of failure.
> > >> > > By using a replicated file system store we can provide maximum
> > cluster
> > >> > > availability.
> > >> > >
> > >> > > What do you think Tammo.
> > >> > >
> > >> > > regards,
> > >> > > sathwik
> > >> > >
> > >> > > On Fri, Mar 27, 2015 at 10:16 AM, sudharma subasinghe <
> > >> > > [email protected]
> > >> > > > wrote:
> > >> > >
> > >> > > > Hi Tammo,
> > >> > > >
> > >> > > > Thank you for the feedback. I'll complete the proposal as your
> > >> comment.
> > >> > > >
> > >> > > > Thank you.
> > >> > > >
> > >> > > > On 27 March 2015 at 04:45, Tammo van Lessen <
> [email protected]
> > >
> > >> > > wrote:
> > >> > > >
> > >> > > > > Hi Sudharma,
> > >> > > > >
> > >> > > > > very good proposal. A minor comment on the paragraph about the
> > >> > > conflicts
> > >> > > > > during deployment: There is already a marker file for deployed
> > >> > > processes,
> > >> > > > > so it wont happen that a node tries to deploy a process that
> is
> > >> > already
> > >> > > > > known to the database, even it has changed. The race condition
> > >> that
> > >> > > needs
> > >> > > > > to be avoided by the clustering implementation is that two
> node
> > >> take
> > >> > up
> > >> > > > > newly added processes at the very same time.
> > >> > > > >
> > >> > > > > Could you please add a short paragraph about your availabilty
> > and
> > >> how
> > >> > > > much
> > >> > > > > time you can commit for GSOC this summer? I'd love to also
> see a
> > >> > > > > deliverable that allows us to easily test ODE in a clustered
> > >> setup,
> > >> > > e.g.
> > >> > > > > using docker-compose. Would that fit into the "Testing and
> > >> develop"
> > >> > > time
> > >> > > > > slot?
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > >   Tammo
> > >> > > > >
> > >> > > > > On Thu, Mar 26, 2015 at 5:26 PM, sudharma subasinghe <
> > >> > > > > [email protected]>
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Hi Tammo,
> > >> > > > > >
> > >> > > > > > I drafted the proposal.This is the link for Google doc. It
> > >> would be
> > >> > > > great
> > >> > > > > > if you can give a feedback on this.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/1H7cLekwUr2juNX2DFzgqq5FEkHZPDtFkHKz0aLWiB2k/edit?usp=sharing
> > >> > > > > >
> > >> > > > > > Thank you.
> > >> > > > > >
> > >> > > > > > On 26 March 2015 at 21:35, Tammo van Lessen <
> > >> [email protected]>
> > >> > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi Sudharma,
> > >> > > > > > >
> > >> > > > > > > yes. Regarding 3) it is in particular the isolation of
> > process
> > >> > > > > instances.
> > >> > > > > > > There must be a load balancer in front of ODE, and the
> lock
> > >> is to
> > >> > > > avoid
> > >> > > > > > the
> > >> > > > > > > case where node one is processing a process instance and
> > node
> > >> two
> > >> > > > > > receives
> > >> > > > > > > a message for the same process instance and starts
> > processing
> > >> as
> > >> > > > well.
> > >> > > > > > >
> > >> > > > > > > Looking forward to your proposal.
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > >   Tammo
> > >> > > > > > >
> > >> > > > > > > On Thu, Mar 26, 2015 at 4:52 PM, sudharma subasinghe <
> > >> > > > > > > [email protected]>
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hi Tammo,
> > >> > > > > > > >
> > >> > > > > > > > Thank you for reply. I went through the thread in jira
> > >> which is
> > >> > > > > > referring
> > >> > > > > > > > this issue. I extracted few ideas from there. As I think
> > >> > > > > implementation
> > >> > > > > > > > should contain following points.
> > >> > > > > > > >
> > >> > > > > > > > 1) Support cluster awareness in deploying phase
> > >> > > > > > > > 2) Improve the ODE's scheduler
> > >> > > > > > > > 3) Implement a distributed lock to avoid concurrent
> > >> > modification
> > >> > > in
> > >> > > > > > > cluster
> > >> > > > > > > >
> > >> > > > > > > > I am drafting a proposal including those points. I'll
> send
> > >> it
> > >> > for
> > >> > > > > your
> > >> > > > > > > > review soon.
> > >> > > > > > > >
> > >> > > > > > > > Thank you.
> > >> > > > > > > >
> > >> > > > > > > > On 26 March 2015 at 18:49, Tammo van Lessen <
> > >> > > [email protected]>
> > >> > > > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Hi,
> > >> > > > > > > > >
> > >> > > > > > > > > ODE is originally designed to be run in a clustered
> > >> fashion,
> > >> > > > > however
> > >> > > > > > it
> > >> > > > > > > > has
> > >> > > > > > > > > never been implemented in ODE. The goal would be to
> > >> > integrate a
> > >> > > > > > > > clustering
> > >> > > > > > > > > framework like Hazelcast in order to add this
> > >> functionality.
> > >> > > > > > > > >
> > >> > > > > > > > > The main integration points are the ODE scheduler and
> > the
> > >> > > process
> > >> > > > > > > store.
> > >> > > > > > > > > The scheduler is already capable to handle several
> nodes
> > >> but
> > >> > > > needs
> > >> > > > > > the
> > >> > > > > > > > > integration to know if cluster nodes are still
> present.
> > >> The
> > >> > API
> > >> > > > > > > currently
> > >> > > > > > > > > anticipates a heart beat model, with Hazelcast this
> > might
> > >> > need
> > >> > > to
> > >> > > > > be
> > >> > > > > > > > > changed or adapted. The other part is the process
> store,
> > >> > which
> > >> > > > > > > implements
> > >> > > > > > > > > the (hot-)deployment that is filesystem based. Under
> the
> > >> > > > assumption
> > >> > > > > > > that
> > >> > > > > > > > a
> > >> > > > > > > > > distributed filesystem is used, the cluster
> > implementation
> > >> > > needs
> > >> > > > to
> > >> > > > > > > take
> > >> > > > > > > > > care that only one single node (the master) is taking
> > >> care of
> > >> > > new
> > >> > > > > > > > > deployments, just in order to avoid multiple nodes
> doing
> > >> the
> > >> > > same
> > >> > > > > > thing
> > >> > > > > > > > in
> > >> > > > > > > > > parallel. Then there is also one lock that needs to be
> > >> > > > distributed,
> > >> > > > > > > > either
> > >> > > > > > > > > using database locks or a distributed lock (e.g. from
> > >> > > hazelcast).
> > >> > > > > > > > >
> > >> > > > > > > > > Addtional requirements would be the integration with
> our
> > >> > config
> > >> > > > > file
> > >> > > > > > so
> > >> > > > > > > > > that a cluster (and its nodes) can be configured as
> well
> > >> as
> > >> > > some
> > >> > > > > > basic
> > >> > > > > > > > > monitoring. Also a basic test environment, e.g. based
> on
> > >> > Docker
> > >> > > > > would
> > >> > > > > > > be
> > >> > > > > > > > > very good to verify the approach.
> > >> > > > > > > > >
> > >> > > > > > > > > So I guess the steps would be: 1. Research to find a
> > >> suitable
> > >> > > > > cluster
> > >> > > > > > > > > framework (I think Hazelcast would be a good fit) and
> > >> getting
> > >> > > > > > familiar
> > >> > > > > > > > with
> > >> > > > > > > > > ODE and this framework. 2. Identify the integration
> > >> points in
> > >> > > > ODE.
> > >> > > > > 3.
> > >> > > > > > > > Based
> > >> > > > > > > > > on the chosen framework, develop approaches to serve
> > these
> > >> > > > > > integration
> > >> > > > > > > > > points (We need leader election for the store, a
> > >> distributed
> > >> > > lock
> > >> > > > > for
> > >> > > > > > > the
> > >> > > > > > > > > runtime and the information whether nodes are joining
> or
> > >> > > leaving
> > >> > > > > the
> > >> > > > > > > > > cluster to be able to reschedule tasks from lost
> nodes)
> > >> along
> > >> > > > with
> > >> > > > > a
> > >> > > > > > > > > distributed setup to test. 4. Develop and test, 5.
> Test.
> > >> > > > > > > > >
> > >> > > > > > > > > For questions regarding the integration points please
> > feel
> > >> > free
> > >> > > > to
> > >> > > > > > ask
> > >> > > > > > > > > here, I can give you some pointers.
> > >> > > > > > > > >
> > >> > > > > > > > > HTH,
> > >> > > > > > > > >   Tammo
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > On Tue, Mar 24, 2015 at 5:03 AM, sudharma subasinghe <
> > >> > > > > > > > > [email protected]>
> > >> > > > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Hi,
> > >> > > > > > > > > >
> > >> > > > > > > > > > I am interested in this project as I have enough
> basic
> > >> > > > knowledge
> > >> > > > > > > about
> > >> > > > > > > > > > apache axis2, apache ODE, WS-BPEL and I am currently
> > >> > studying
> > >> > > > > > those.
> > >> > > > > > > > So I
> > >> > > > > > > > > > appreciate if you can provide more details on
> project.
> > >> > > > > > > > > > Thank you
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > --
> > >> > > > > > > > > Tammo van Lessen - http://www.taval.de
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > > Tammo van Lessen - http://www.taval.de
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > Tammo van Lessen - http://www.taval.de
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: GSoC-2015: Clustering [ODE-563]

Reply via email to