Re: GSoC-2015: Clustering [ODE-563]

sudharma subasinghe Fri, 27 Mar 2015 04:44:18 -0700

Hi Sathwik,

The database is already shared. For avoiding the race condition in
deploying, I use the master-slave configuration. There I introduce a new
table contains md5sum and package name in the database. I think it's not
need to use a replicated file system as each node can deploy the package at
run time observing that table.


Thank you.

On 27 March 2015 at 15:13, sudharma subasinghe <[email protected]>
wrote:

> Hi Sathwik,
>
> Now I got the idea. Thanks for the information. I'll consider the point.
>
> Thank you.
>
> On 27 March 2015 at 14:34, Sathwik B P <[email protected]> wrote:
>
>> Hi Sudharma,
>>
>> This is not about storing in the database.
>>
>> When a deployment is initiated to ODE, the process artifacts are first
>> stored in the file system (under WEB-INF/processes folder is the default
>> configuration). Then a deployment poller scans for new files on the file
>> store and invokes the bpel compiler to create the java object model from
>> the bpel file (we refer it as OModel) with a file name ending with *.cbp*
>> on the file system. Based on this object model, data is populated in the
>> database. Remember that the whole model is not stored in the database. So
>> on a restart of the machine hosting ODE, the deployment poller will check
>> for this *.CBP* file and load into memory. As a process instance is
>> created
>> the activities is taken from this model and fed into the JaCOB which is
>> ODE's VPU.
>>
>> So I am basically talking about this file system where the process
>> artifacts will be stored.
>>
>> Today this resides locally on the machine where ODE war is deployed. Think
>> about in a cluster.  Where would you suggest to have this file system.
>>
>> regards,
>> sathwik
>>
>> On Fri, Mar 27, 2015 at 2:08 PM, sudharma subasinghe <
>> [email protected]>
>> wrote:
>>
>> > Hi Sathwik,
>> >
>> > Thank you for your feedback. I agree with you about the single point of
>> > failure. I think in the Hazelcast cluster when the database goes down,
>> > writes in the cache are queued in a log file so the writes can be
>> persisted
>> > in the database once it is backed up. Am I right?
>> >
>> > Thank you.
>> >
>> > On 27 March 2015 at 13:27, Sathwik B P <[email protected]> wrote:
>> >
>> > > Hi Sudharma,
>> > >
>> > > It's a good proposal :)
>> > >
>> > > I have one piece information to provide you. As you must be aware that
>> > ODE
>> > > stores it's deployment artifacts on a file system.
>> > > In a cluster environment this file system should either be accessible
>> to
>> > > all the nodes or we need a distributed/replicated file system.
>> > >
>> > > Though i would recommend using a replicated file system more than a
>> > shared
>> > > file system. A shared file system would either be on a separate box on
>> > the
>> > > cluster and its single point of failure.
>> > > By using a replicated file system store we can provide maximum cluster
>> > > availability.
>> > >
>> > > What do you think Tammo.
>> > >
>> > > regards,
>> > > sathwik
>> > >
>> > > On Fri, Mar 27, 2015 at 10:16 AM, sudharma subasinghe <
>> > > [email protected]
>> > > > wrote:
>> > >
>> > > > Hi Tammo,
>> > > >
>> > > > Thank you for the feedback. I'll complete the proposal as your
>> comment.
>> > > >
>> > > > Thank you.
>> > > >
>> > > > On 27 March 2015 at 04:45, Tammo van Lessen <[email protected]>
>> > > wrote:
>> > > >
>> > > > > Hi Sudharma,
>> > > > >
>> > > > > very good proposal. A minor comment on the paragraph about the
>> > > conflicts
>> > > > > during deployment: There is already a marker file for deployed
>> > > processes,
>> > > > > so it wont happen that a node tries to deploy a process that is
>> > already
>> > > > > known to the database, even it has changed. The race condition
>> that
>> > > needs
>> > > > > to be avoided by the clustering implementation is that two node
>> take
>> > up
>> > > > > newly added processes at the very same time.
>> > > > >
>> > > > > Could you please add a short paragraph about your availabilty and
>> how
>> > > > much
>> > > > > time you can commit for GSOC this summer? I'd love to also see a
>> > > > > deliverable that allows us to easily test ODE in a clustered
>> setup,
>> > > e.g.
>> > > > > using docker-compose. Would that fit into the "Testing and
>> develop"
>> > > time
>> > > > > slot?
>> > > > >
>> > > > > Thanks,
>> > > > >   Tammo
>> > > > >
>> > > > > On Thu, Mar 26, 2015 at 5:26 PM, sudharma subasinghe <
>> > > > > [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi Tammo,
>> > > > > >
>> > > > > > I drafted the proposal.This is the link for Google doc. It
>> would be
>> > > > great
>> > > > > > if you can give a feedback on this.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1H7cLekwUr2juNX2DFzgqq5FEkHZPDtFkHKz0aLWiB2k/edit?usp=sharing
>> > > > > >
>> > > > > > Thank you.
>> > > > > >
>> > > > > > On 26 March 2015 at 21:35, Tammo van Lessen <
>> [email protected]>
>> > > > > wrote:
>> > > > > >
>> > > > > > > Hi Sudharma,
>> > > > > > >
>> > > > > > > yes. Regarding 3) it is in particular the isolation of process
>> > > > > instances.
>> > > > > > > There must be a load balancer in front of ODE, and the lock
>> is to
>> > > > avoid
>> > > > > > the
>> > > > > > > case where node one is processing a process instance and node
>> two
>> > > > > > receives
>> > > > > > > a message for the same process instance and starts processing
>> as
>> > > > well.
>> > > > > > >
>> > > > > > > Looking forward to your proposal.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >   Tammo
>> > > > > > >
>> > > > > > > On Thu, Mar 26, 2015 at 4:52 PM, sudharma subasinghe <
>> > > > > > > [email protected]>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Tammo,
>> > > > > > > >
>> > > > > > > > Thank you for reply. I went through the thread in jira
>> which is
>> > > > > > referring
>> > > > > > > > this issue. I extracted few ideas from there. As I think
>> > > > > implementation
>> > > > > > > > should contain following points.
>> > > > > > > >
>> > > > > > > > 1) Support cluster awareness in deploying phase
>> > > > > > > > 2) Improve the ODE's scheduler
>> > > > > > > > 3) Implement a distributed lock to avoid concurrent
>> > modification
>> > > in
>> > > > > > > cluster
>> > > > > > > >
>> > > > > > > > I am drafting a proposal including those points. I'll send
>> it
>> > for
>> > > > > your
>> > > > > > > > review soon.
>> > > > > > > >
>> > > > > > > > Thank you.
>> > > > > > > >
>> > > > > > > > On 26 March 2015 at 18:49, Tammo van Lessen <
>> > > [email protected]>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi,
>> > > > > > > > >
>> > > > > > > > > ODE is originally designed to be run in a clustered
>> fashion,
>> > > > > however
>> > > > > > it
>> > > > > > > > has
>> > > > > > > > > never been implemented in ODE. The goal would be to
>> > integrate a
>> > > > > > > > clustering
>> > > > > > > > > framework like Hazelcast in order to add this
>> functionality.
>> > > > > > > > >
>> > > > > > > > > The main integration points are the ODE scheduler and the
>> > > process
>> > > > > > > store.
>> > > > > > > > > The scheduler is already capable to handle several nodes
>> but
>> > > > needs
>> > > > > > the
>> > > > > > > > > integration to know if cluster nodes are still present.
>> The
>> > API
>> > > > > > > currently
>> > > > > > > > > anticipates a heart beat model, with Hazelcast this might
>> > need
>> > > to
>> > > > > be
>> > > > > > > > > changed or adapted. The other part is the process store,
>> > which
>> > > > > > > implements
>> > > > > > > > > the (hot-)deployment that is filesystem based. Under the
>> > > > assumption
>> > > > > > > that
>> > > > > > > > a
>> > > > > > > > > distributed filesystem is used, the cluster implementation
>> > > needs
>> > > > to
>> > > > > > > take
>> > > > > > > > > care that only one single node (the master) is taking
>> care of
>> > > new
>> > > > > > > > > deployments, just in order to avoid multiple nodes doing
>> the
>> > > same
>> > > > > > thing
>> > > > > > > > in
>> > > > > > > > > parallel. Then there is also one lock that needs to be
>> > > > distributed,
>> > > > > > > > either
>> > > > > > > > > using database locks or a distributed lock (e.g. from
>> > > hazelcast).
>> > > > > > > > >
>> > > > > > > > > Addtional requirements would be the integration with our
>> > config
>> > > > > file
>> > > > > > so
>> > > > > > > > > that a cluster (and its nodes) can be configured as well
>> as
>> > > some
>> > > > > > basic
>> > > > > > > > > monitoring. Also a basic test environment, e.g. based on
>> > Docker
>> > > > > would
>> > > > > > > be
>> > > > > > > > > very good to verify the approach.
>> > > > > > > > >
>> > > > > > > > > So I guess the steps would be: 1. Research to find a
>> suitable
>> > > > > cluster
>> > > > > > > > > framework (I think Hazelcast would be a good fit) and
>> getting
>> > > > > > familiar
>> > > > > > > > with
>> > > > > > > > > ODE and this framework. 2. Identify the integration
>> points in
>> > > > ODE.
>> > > > > 3.
>> > > > > > > > Based
>> > > > > > > > > on the chosen framework, develop approaches to serve these
>> > > > > > integration
>> > > > > > > > > points (We need leader election for the store, a
>> distributed
>> > > lock
>> > > > > for
>> > > > > > > the
>> > > > > > > > > runtime and the information whether nodes are joining or
>> > > leaving
>> > > > > the
>> > > > > > > > > cluster to be able to reschedule tasks from lost nodes)
>> along
>> > > > with
>> > > > > a
>> > > > > > > > > distributed setup to test. 4. Develop and test, 5. Test.
>> > > > > > > > >
>> > > > > > > > > For questions regarding the integration points please feel
>> > free
>> > > > to
>> > > > > > ask
>> > > > > > > > > here, I can give you some pointers.
>> > > > > > > > >
>> > > > > > > > > HTH,
>> > > > > > > > >   Tammo
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Tue, Mar 24, 2015 at 5:03 AM, sudharma subasinghe <
>> > > > > > > > > [email protected]>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi,
>> > > > > > > > > >
>> > > > > > > > > > I am interested in this project as I have enough basic
>> > > > knowledge
>> > > > > > > about
>> > > > > > > > > > apache axis2, apache ODE, WS-BPEL and I am currently
>> > studying
>> > > > > > those.
>> > > > > > > > So I
>> > > > > > > > > > appreciate if you can provide more details on project.
>> > > > > > > > > > Thank you
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > > Tammo van Lessen - http://www.taval.de
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Tammo van Lessen - http://www.taval.de
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Tammo van Lessen - http://www.taval.de
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: GSoC-2015: Clustering [ODE-563]

Reply via email to