Re: GSoC-2015: Clustering [ODE-563]

sudharma subasinghe Fri, 27 Mar 2015 12:06:47 -0700

Hi

In that case, I have introduced option to notify all the nodes in the
cluster when bpel package is deploying by the master node. So all other
nodes will deploy the process at the same moment.can't it be the solution?


Thank you.

On 27 March 2015 at 23:52, Sathwik B P <sathwik...@gmail.com> wrote:

> Hi,
>
> Lets keep the ODE and md5sum concept aside.
>
> I will try make is as simple as possible.
>
> I hope you understand that Tomcat server is a Web Application Container.
> Assume we have a cluster of Tomcat servers with Apache web server acting as
> LB.
>
> We want to deploy a web application (a war file) to this tomcat cluster.
> Tomcat server has a default deployment folder for deploying web
> applications named webapps folder.
>
> As part of the deployment, we copy the war file on one of the nodes named
> Node1 under webapps folder in the cluster hosting the tomcat server. The
> other nodes don't have this web application deployed under them.
>
> This new web application is now deployed successfully under Node1.
>
> Node1 starts accepting client requests and then goes down for obvious
> reasons. Now, the new client requests are to be routed to the remaining
> nodes in the cluster.
>
> But the remaining nodes have no information about this web application as
> it was not deployed to them and they start rejecting the client requests.
>
> On these lines, as Tomcat is a web application container, ODE is a business
> process container having it's own deployment folder.
>
> I hope you understand the problem of deployment into the cluster now, be it
> with Tomcat or ODE.
>
> regards,
> sathwik
>
> On Fri, Mar 27, 2015 at 7:22 PM, sudharma subasinghe <
> suba...@cse.mrt.ac.lk>
> wrote:
>
> > Hi Sathwik,
> >
> > In the shared database, there is table which contains md5sum and package
> > names related to deploying packages. When it comes to deploy a  new
> package
> > to the master node it calculates the md5sum value and stores the package
> > name within new version number. If the package is a new version of a
> > deployed one it checks the table for md5sum value. As it doesn't match
> with
> > existing values it creates a new version in the table.
> >
> > When it comes to deploy a package to the slave node, it checks the table
> > for md5sum value and if there's matching then it is read from the table.
> If
> > there is not a matching, give a warning and ignore it.
> >
> > Using the table each node can read the data in the table as described
> > above. as Each node have a separate WEB-INF/processes folder it can be
> used
> > as the file system. So at run time node can read data and can load the
> > process model which is located in the local file system.
> >
> > Can you reply me if there is anything wrong.
> >
> > Thank you.
> >
> > On 27 March 2015 at 18:41, Sathwik B P <sathwik...@gmail.com> wrote:
> >
> > > Hi Sudharma,
> > >
> > > Sorry, I didn't understand your solution.
> > >
> > > Let me try to understand with few questions.
> > >
> > > Assume there are 3 node cluster. One is Leader and other is 2 Slaves.
> > There
> > > is an LB.
> > >
> > > The payload of the deploy operation in Deploy Service is a Base64
> encoded
> > > contents of the process artifacts (HelloWorld.zip)
> > > Assuming the request lands on the Leader, the archive has to be
> unpacked
> > on
> > > the file system.
> > >
> > > What is this file system going to be?
> > > Slaves needs access to this file system where the process artifacts are
> > > unpacked. When the Leader is unavailable and one of the Slave is
> elected
> > as
> > > a Leader, how would it get the process model to load and execute?
> > >
> > > regards,
> > > sathwik
> > >
> > > On Fri, Mar 27, 2015 at 5:12 PM, sudharma subasinghe <
> > > suba...@cse.mrt.ac.lk>
> > > wrote:
> > >
> > > > Hi Sathwik,
> > > >
> > > > The database is already shared. For avoiding the race condition in
> > > > deploying, I use the master-slave configuration. There I introduce a
> > new
> > > > table contains md5sum and package name in the database. I think it's
> > not
> > > > need to use a replicated file system as each node can deploy the
> > package
> > > at
> > > > run time observing that table.
> > > >
> > > > Thank you.
> > > >
> > > > On 27 March 2015 at 15:13, sudharma subasinghe <
> suba...@cse.mrt.ac.lk>
> > > > wrote:
> > > >
> > > > > Hi Sathwik,
> > > > >
> > > > > Now I got the idea. Thanks for the information. I'll consider the
> > > point.
> > > > >
> > > > > Thank you.
> > > > >
> > > > > On 27 March 2015 at 14:34, Sathwik B P <sathwik...@gmail.com>
> wrote:
> > > > >
> > > > >> Hi Sudharma,
> > > > >>
> > > > >> This is not about storing in the database.
> > > > >>
> > > > >> When a deployment is initiated to ODE, the process artifacts are
> > first
> > > > >> stored in the file system (under WEB-INF/processes folder is the
> > > default
> > > > >> configuration). Then a deployment poller scans for new files on
> the
> > > file
> > > > >> store and invokes the bpel compiler to create the java object
> model
> > > from
> > > > >> the bpel file (we refer it as OModel) with a file name ending with
> > > > *.cbp*
> > > > >> on the file system. Based on this object model, data is populated
> in
> > > the
> > > > >> database. Remember that the whole model is not stored in the
> > database.
> > > > So
> > > > >> on a restart of the machine hosting ODE, the deployment poller
> will
> > > > check
> > > > >> for this *.CBP* file and load into memory. As a process instance
> is
> > > > >> created
> > > > >> the activities is taken from this model and fed into the JaCOB
> which
> > > is
> > > > >> ODE's VPU.
> > > > >>
> > > > >> So I am basically talking about this file system where the process
> > > > >> artifacts will be stored.
> > > > >>
> > > > >> Today this resides locally on the machine where ODE war is
> deployed.
> > > > Think
> > > > >> about in a cluster.  Where would you suggest to have this file
> > system.
> > > > >>
> > > > >> regards,
> > > > >> sathwik
> > > > >>
> > > > >> On Fri, Mar 27, 2015 at 2:08 PM, sudharma subasinghe <
> > > > >> suba...@cse.mrt.ac.lk>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Sathwik,
> > > > >> >
> > > > >> > Thank you for your feedback. I agree with you about the single
> > point
> > > > of
> > > > >> > failure. I think in the Hazelcast cluster when the database goes
> > > down,
> > > > >> > writes in the cache are queued in a log file so the writes can
> be
> > > > >> persisted
> > > > >> > in the database once it is backed up. Am I right?
> > > > >> >
> > > > >> > Thank you.
> > > > >> >
> > > > >> > On 27 March 2015 at 13:27, Sathwik B P <sathwik...@gmail.com>
> > > wrote:
> > > > >> >
> > > > >> > > Hi Sudharma,
> > > > >> > >
> > > > >> > > It's a good proposal :)
> > > > >> > >
> > > > >> > > I have one piece information to provide you. As you must be
> > aware
> > > > that
> > > > >> > ODE
> > > > >> > > stores it's deployment artifacts on a file system.
> > > > >> > > In a cluster environment this file system should either be
> > > > accessible
> > > > >> to
> > > > >> > > all the nodes or we need a distributed/replicated file system.
> > > > >> > >
> > > > >> > > Though i would recommend using a replicated file system more
> > than
> > > a
> > > > >> > shared
> > > > >> > > file system. A shared file system would either be on a
> separate
> > > box
> > > > on
> > > > >> > the
> > > > >> > > cluster and its single point of failure.
> > > > >> > > By using a replicated file system store we can provide maximum
> > > > cluster
> > > > >> > > availability.
> > > > >> > >
> > > > >> > > What do you think Tammo.
> > > > >> > >
> > > > >> > > regards,
> > > > >> > > sathwik
> > > > >> > >
> > > > >> > > On Fri, Mar 27, 2015 at 10:16 AM, sudharma subasinghe <
> > > > >> > > suba...@cse.mrt.ac.lk
> > > > >> > > > wrote:
> > > > >> > >
> > > > >> > > > Hi Tammo,
> > > > >> > > >
> > > > >> > > > Thank you for the feedback. I'll complete the proposal as
> your
> > > > >> comment.
> > > > >> > > >
> > > > >> > > > Thank you.
> > > > >> > > >
> > > > >> > > > On 27 March 2015 at 04:45, Tammo van Lessen <
> > > tvanles...@gmail.com
> > > > >
> > > > >> > > wrote:
> > > > >> > > >
> > > > >> > > > > Hi Sudharma,
> > > > >> > > > >
> > > > >> > > > > very good proposal. A minor comment on the paragraph about
> > the
> > > > >> > > conflicts
> > > > >> > > > > during deployment: There is already a marker file for
> > deployed
> > > > >> > > processes,
> > > > >> > > > > so it wont happen that a node tries to deploy a process
> that
> > > is
> > > > >> > already
> > > > >> > > > > known to the database, even it has changed. The race
> > condition
> > > > >> that
> > > > >> > > needs
> > > > >> > > > > to be avoided by the clustering implementation is that two
> > > node
> > > > >> take
> > > > >> > up
> > > > >> > > > > newly added processes at the very same time.
> > > > >> > > > >
> > > > >> > > > > Could you please add a short paragraph about your
> > availabilty
> > > > and
> > > > >> how
> > > > >> > > > much
> > > > >> > > > > time you can commit for GSOC this summer? I'd love to also
> > > see a
> > > > >> > > > > deliverable that allows us to easily test ODE in a
> clustered
> > > > >> setup,
> > > > >> > > e.g.
> > > > >> > > > > using docker-compose. Would that fit into the "Testing and
> > > > >> develop"
> > > > >> > > time
> > > > >> > > > > slot?
> > > > >> > > > >
> > > > >> > > > > Thanks,
> > > > >> > > > >   Tammo
> > > > >> > > > >
> > > > >> > > > > On Thu, Mar 26, 2015 at 5:26 PM, sudharma subasinghe <
> > > > >> > > > > suba...@cse.mrt.ac.lk>
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Hi Tammo,
> > > > >> > > > > >
> > > > >> > > > > > I drafted the proposal.This is the link for Google doc.
> It
> > > > >> would be
> > > > >> > > > great
> > > > >> > > > > > if you can give a feedback on this.
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1H7cLekwUr2juNX2DFzgqq5FEkHZPDtFkHKz0aLWiB2k/edit?usp=sharing
> > > > >> > > > > >
> > > > >> > > > > > Thank you.
> > > > >> > > > > >
> > > > >> > > > > > On 26 March 2015 at 21:35, Tammo van Lessen <
> > > > >> tvanles...@gmail.com>
> > > > >> > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > Hi Sudharma,
> > > > >> > > > > > >
> > > > >> > > > > > > yes. Regarding 3) it is in particular the isolation of
> > > > process
> > > > >> > > > > instances.
> > > > >> > > > > > > There must be a load balancer in front of ODE, and the
> > > lock
> > > > >> is to
> > > > >> > > > avoid
> > > > >> > > > > > the
> > > > >> > > > > > > case where node one is processing a process instance
> and
> > > > node
> > > > >> two
> > > > >> > > > > > receives
> > > > >> > > > > > > a message for the same process instance and starts
> > > > processing
> > > > >> as
> > > > >> > > > well.
> > > > >> > > > > > >
> > > > >> > > > > > > Looking forward to your proposal.
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > >   Tammo
> > > > >> > > > > > >
> > > > >> > > > > > > On Thu, Mar 26, 2015 at 4:52 PM, sudharma subasinghe <
> > > > >> > > > > > > suba...@cse.mrt.ac.lk>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Hi Tammo,
> > > > >> > > > > > > >
> > > > >> > > > > > > > Thank you for reply. I went through the thread in
> jira
> > > > >> which is
> > > > >> > > > > > referring
> > > > >> > > > > > > > this issue. I extracted few ideas from there. As I
> > think
> > > > >> > > > > implementation
> > > > >> > > > > > > > should contain following points.
> > > > >> > > > > > > >
> > > > >> > > > > > > > 1) Support cluster awareness in deploying phase
> > > > >> > > > > > > > 2) Improve the ODE's scheduler
> > > > >> > > > > > > > 3) Implement a distributed lock to avoid concurrent
> > > > >> > modification
> > > > >> > > in
> > > > >> > > > > > > cluster
> > > > >> > > > > > > >
> > > > >> > > > > > > > I am drafting a proposal including those points.
> I'll
> > > send
> > > > >> it
> > > > >> > for
> > > > >> > > > > your
> > > > >> > > > > > > > review soon.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Thank you.
> > > > >> > > > > > > >
> > > > >> > > > > > > > On 26 March 2015 at 18:49, Tammo van Lessen <
> > > > >> > > tvanles...@gmail.com>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Hi,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > ODE is originally designed to be run in a
> clustered
> > > > >> fashion,
> > > > >> > > > > however
> > > > >> > > > > > it
> > > > >> > > > > > > > has
> > > > >> > > > > > > > > never been implemented in ODE. The goal would be
> to
> > > > >> > integrate a
> > > > >> > > > > > > > clustering
> > > > >> > > > > > > > > framework like Hazelcast in order to add this
> > > > >> functionality.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > The main integration points are the ODE scheduler
> > and
> > > > the
> > > > >> > > process
> > > > >> > > > > > > store.
> > > > >> > > > > > > > > The scheduler is already capable to handle several
> > > nodes
> > > > >> but
> > > > >> > > > needs
> > > > >> > > > > > the
> > > > >> > > > > > > > > integration to know if cluster nodes are still
> > > present.
> > > > >> The
> > > > >> > API
> > > > >> > > > > > > currently
> > > > >> > > > > > > > > anticipates a heart beat model, with Hazelcast
> this
> > > > might
> > > > >> > need
> > > > >> > > to
> > > > >> > > > > be
> > > > >> > > > > > > > > changed or adapted. The other part is the process
> > > store,
> > > > >> > which
> > > > >> > > > > > > implements
> > > > >> > > > > > > > > the (hot-)deployment that is filesystem based.
> Under
> > > the
> > > > >> > > > assumption
> > > > >> > > > > > > that
> > > > >> > > > > > > > a
> > > > >> > > > > > > > > distributed filesystem is used, the cluster
> > > > implementation
> > > > >> > > needs
> > > > >> > > > to
> > > > >> > > > > > > take
> > > > >> > > > > > > > > care that only one single node (the master) is
> > taking
> > > > >> care of
> > > > >> > > new
> > > > >> > > > > > > > > deployments, just in order to avoid multiple nodes
> > > doing
> > > > >> the
> > > > >> > > same
> > > > >> > > > > > thing
> > > > >> > > > > > > > in
> > > > >> > > > > > > > > parallel. Then there is also one lock that needs
> to
> > be
> > > > >> > > > distributed,
> > > > >> > > > > > > > either
> > > > >> > > > > > > > > using database locks or a distributed lock (e.g.
> > from
> > > > >> > > hazelcast).
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Addtional requirements would be the integration
> with
> > > our
> > > > >> > config
> > > > >> > > > > file
> > > > >> > > > > > so
> > > > >> > > > > > > > > that a cluster (and its nodes) can be configured
> as
> > > well
> > > > >> as
> > > > >> > > some
> > > > >> > > > > > basic
> > > > >> > > > > > > > > monitoring. Also a basic test environment, e.g.
> > based
> > > on
> > > > >> > Docker
> > > > >> > > > > would
> > > > >> > > > > > > be
> > > > >> > > > > > > > > very good to verify the approach.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > So I guess the steps would be: 1. Research to
> find a
> > > > >> suitable
> > > > >> > > > > cluster
> > > > >> > > > > > > > > framework (I think Hazelcast would be a good fit)
> > and
> > > > >> getting
> > > > >> > > > > > familiar
> > > > >> > > > > > > > with
> > > > >> > > > > > > > > ODE and this framework. 2. Identify the
> integration
> > > > >> points in
> > > > >> > > > ODE.
> > > > >> > > > > 3.
> > > > >> > > > > > > > Based
> > > > >> > > > > > > > > on the chosen framework, develop approaches to
> serve
> > > > these
> > > > >> > > > > > integration
> > > > >> > > > > > > > > points (We need leader election for the store, a
> > > > >> distributed
> > > > >> > > lock
> > > > >> > > > > for
> > > > >> > > > > > > the
> > > > >> > > > > > > > > runtime and the information whether nodes are
> > joining
> > > or
> > > > >> > > leaving
> > > > >> > > > > the
> > > > >> > > > > > > > > cluster to be able to reschedule tasks from lost
> > > nodes)
> > > > >> along
> > > > >> > > > with
> > > > >> > > > > a
> > > > >> > > > > > > > > distributed setup to test. 4. Develop and test, 5.
> > > Test.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > For questions regarding the integration points
> > please
> > > > feel
> > > > >> > free
> > > > >> > > > to
> > > > >> > > > > > ask
> > > > >> > > > > > > > > here, I can give you some pointers.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > HTH,
> > > > >> > > > > > > > >   Tammo
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Tue, Mar 24, 2015 at 5:03 AM, sudharma
> > subasinghe <
> > > > >> > > > > > > > > suba...@cse.mrt.ac.lk>
> > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > Hi,
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > I am interested in this project as I have enough
> > > basic
> > > > >> > > > knowledge
> > > > >> > > > > > > about
> > > > >> > > > > > > > > > apache axis2, apache ODE, WS-BPEL and I am
> > currently
> > > > >> > studying
> > > > >> > > > > > those.
> > > > >> > > > > > > > So I
> > > > >> > > > > > > > > > appreciate if you can provide more details on
> > > project.
> > > > >> > > > > > > > > > Thank you
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > --
> > > > >> > > > > > > > > Tammo van Lessen - http://www.taval.de
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > --
> > > > >> > > > > > > Tammo van Lessen - http://www.taval.de
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Tammo van Lessen - http://www.taval.de
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: GSoC-2015: Clustering [ODE-563]

Reply via email to