Hi all, I am Thamayanthy Sripalan, an third year undergraduate of University of Moratuwa.
I am interested in doing this[1] project as my GSoC project as I am keen interested in learning about clustering and I have enough basic understanding of the apache axis2, apache ODE, WS-BPEL and BPEL4WS. As I guess there are only three features to be implemented in order to cluster the ODE engine, those are: 1. Having and managing a common database/process store for all the nodes - If only one database is shared among all the nodes the same process definition should be shared among all the nodes. If we allow all the databased to deploy the processes, due to versioning problem the running instances might get null pointer exception. This will happen because if the same process is deployed again, the previous process will go to retired state and the newly deployed one will become the active process. So the previously created process instances will not be able to find their process definition because the deployed process name will be changed when the new version of the process is deployed. - As a solution for this we need to allow only one node(Master node) to deploy the processes and other nodes will only read/refer the deployed process. 2. Handling assignments of jobs among the nodes - If one node is running an instance the other node cannot access that instance because if two threads/process instances are trying to access the same job entry in the database there will be a consistency problem. - To solve this we can have another database table having the instance_id and the node which can handle/execute that job to manage the assigned jobs of each nodes. If the job assigned node fails then there should be a mechanism to distribute that node's jobs to the other alive nodes. For that we can use Hazelcast (I guess) to handle those things. 3. Handling multi threaded environment - In case if one process instance has more than one services to be invoked in a sequential manner those services can be executed in different nodes. So that we need to allow multiple nodes to access the same process instance's entry in the database. In this case we cannot restrict that only one node can execute/perform the job. - To handle this one we can use distributed log to execute a job. so that only one thread can have the access to a particular job entry. I think that this logging mechanism is already implemented in the single node ODE also. We need to make sure that the distributed log mechanism is functioning when we do clustering. Also I have commented this in the jira about the features. I believe that it is feasible to be implemented within the given time frame. and I would be glad if I am selected to do this project as a GSoC this year. Can I take the above mentioned features as my project tasks to be achieved? links: [1] https://issues.apache.org/jira/browse/ODE-563 Thank you -- Thamayanthy Sripalan Undergraduate Department of Computer Science and Engineering University of Moratuwa.