I will try to give an explanation of how the model is working. I will
explain based on the examples I have developed to test the model.
*# Overview*
First of all, what kind of application is the guardian model applicable to?
The model is applicable to solve the problem of concurrent exceptions
occurrence in concurrent applications. So, we have two or more participants
executing at the same time, and exchanging messages in a cooperative action.
The test scenario is the primary-backup with N backups. In this scenario we
have a server-client application, with N participants on the server side.
The first participant to join in the server side becomes the primary server,
and the subsequent ones are the backups. The primary gets a request from a
client, and sends a reply to the client and a copy of its state to the
backups. When the primary fails, the first backup on the queue becomes the
new server. On the other hand, when a backup fails, the primary simply stops
to send updates to it.
*# SCDL file for primary-backup with N backups scenario*
Since all participants need to know each other, we define:
<composite>
<component name="Participant1">
<implementation.java
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
<reference name="nodes" target="Participant2 Participant3
Participant4"/>
</component>
<component name="Participant2">
<implementation.java
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
<reference name="nodes" target="Participant1 Participant3
Participant4"/>
</component>
<component name="Participant3">
<implementation.java
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
<reference name="nodes" target="Participant1 Participant2
Participant4"/>
</component>
<component name="Participant4">
<implementation.java
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.NodeImpl"/>
<reference name="nodes" target="Participant1 Participant2
Participant3"/>
</component>
...
</composite>
Each participant is an instance of the NodeImpl class ([1]) that contains
three main methods: *execute*, *sendUpdate*, and *applyUpdate*. The first
one is used to start the participant's execution thread. This method is
annotated with @OneWay annotation, which marks the execution to be
asynchronous. The second method, is used by the server to send updates to
the backups. Finally, the *applyUpdate *is used by the backups to apply the
updates received from the server.
All the communication referent to the exceptional behavior between the
participants is done by the guardian, which was implemented as a component.
So, we need to define the guardian in the SCDL file:
<composite>
<component name="Participant1">...</component>
<component name="Participant2">...</component>
<component name="Participant3">...</component>
<component name="Participant4">...</component>
<component name="GuardianGroup">
<implementation.java
class="org.apache.tuscany.sca.guardian.GuardianGroupImpl"/>
<property
name="recovery_rules">src/main/resources/recoveryrules_nbackpus_concurrent.xml</property>
<property
name="resolution_tree">src/main/resources/resolutionTree.xml</property>
</component>
<composite>
The guardian is an instance of the
org.apache.tuscany.sca.guardian.GuardianGroupImpl class, and provides the
org.apache.tuscany.sca.guardian.GuardianPrimitives as the main interface for
communication.
The GuardianPrimitives contains the following methods:
1. public void enableContext(Context context);
2. public void removeContext();
3. public void gthrow(GlobalExceptionInterface ex, List<String>
participantList);
4. public boolean propagate(GlobalExceptionInterface ex);
5. public void checkExceptionStatus() throws GlobalException;
The methods 1 and 2 are designed to add and remove a context, respectively.
The method 3 is used every time a participant want to signal an external
exception, in other words, an exception that needs to be treated
cooperatively by a set of participant.
The method 4 is used to check if a specific exception needs to be propagated
to another context or not.
The method 5 is used to check if there are exceptions to be treated.
These methods are the channel the participants use to communicate with each
other, when they need to treat an exception cooperatively.
However, the participants do not communicate with the guardian directly.
Instead, they communicate with a guardian member, which is a mediator
between the participants and the guardian. Each participant is associated
with a guardian member. So the communication is established like this:
participant -> guardian member -> guardian, and guardian -> guardian member
-> participant.
The guardian member was implemented as a component too:
<composite>
...
<component name="GuardianMember1">
<implementation.java
class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
<reference name="guardian_group" target="GuardianGroup"/>
</component>
<component name="GuardianMember2">
<implementation.java
class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
<reference name="guardian_group" target="GuardianGroup"/>
</component>
<component name="GuardianMember3">
<implementation.java
class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
<reference name="guardian_group" target="GuardianGroup"/>
</component>
<component name="GuardianMember4">
<implementation.java
class="org.apache.tuscany.sca.guardian.GuardianMemberImpl"/>
<reference name="guardian_group" target="GuardianGroup"/>
</component>
<component name="GuardianGroup">...</component>
</composite>
The org.apache.tuscany.sca.guardian.GuardianMemberImpl defines the guardian
member. Each guardian member has a reference to the guardian group, as well
as, each participant has a reference to its respective guardian member.
The full SCDL file can be found at [2].
The GuardianMemberImpl implements the GuardianPrimitives, so the
participants communicate with each other using the methods present in that
interface through their respective guardian members.
*
#Using the model
*
Hitherto, we have talked about three concepts of the guardian model: the
guardian group, the guardian members, and the guardian primitives. Another
important concept is the contexts. A context defines a place in the
participant, to signal and treat external exceptions. A context has two
important attributes: a name, and the list of exception that can be treated
in that context. The class org.apache.tuscany.sca.guardian.Context defines
an instance for a context.
The primary-backup scenario has three contexts: MAIN, PRIMARY, and BACKUP,
where the PRIMARY and BACKUP are nested contexts to the MAIN context. A
context can be activate using the *enableContext *method from the guardian
member. The *disableContext *has the contrary effect. One time a context is
activated, it keeps on this state until the invocation of *disableContext *or
the activation of a nested context.
The general structure of the NodeImpl class is shown below:
1. @OneWay
2. public void execute() {
3. gm.enableContext(mainContext);
4. while (true) {
5. try {
6. gm.checkExceptionStatus();
7. if (role == PRIMARY) {
8. //Config as primary then...
9. primaryService();
10. } else {
11. //Config as backup then...
12. backupService();
13. }
14. } catch (PrimaryExistsException ex) {...}
15. catch (PrimaryFailedException ex) {...}
16. catch (BackupFailedException ex) {...}
17. }
18. }
19. private void primaryService() {
20. while (true) {
21. gm.enableContext(primaryContext);
22. try {
23. gm.checkExceptionStatus();
24. //Process the request then...
25. ...
26. if (backupAvailable) {
27. //send updates to the backups
28. ...
29. }
30. //send the reply to the client
31. ...
32. } catch (PrimaryServiceFailureException ex) {...}
33. catch (BackupFailedException ex) {...}
34. catch (BackupJoinedException ex) {...}
35. finally {
36. gm.removeContext();
37. }
38. }
39. }
40. private void backupService() {
41. while (true) {
42. gm.enableContext(backupContext);
43. try {
44. gm.checkExceptionStatus();
45. applyUpdate();
46. } catch (ApplyUpdateFailureException ex) {...}
47. finally {
48. gm.removeContext();
49. }
50. }
51. }
As can be noticed the MAIN context is activated in the rows 1-18; the
PRIMARY in the rows 19-39; and the BACKUP in the rows 40- 51. Each context
is associated to a method, and since the *primaryService()* and *
backupService()* are invoked inside the *execute()*, we have the PRIMARY and
BACKUP as nested contexts to the MAIN context. When the first participant
joins in the guardian group, it context list is defined as MAIN.PRIMARY. For
the subsequent participants, the context list is defined as MAIN.BACKUP.
The core of this general structure is:
//scope
{
//Activate a context
gm.enableContext(SomeContext);
try{
//Check for unhandled exceptions
gm.checkExceptionStatus();
//Application-specific code
. . .
}catch () {}
finally {
gm.removeContext();
}
}
After the activation of a context, it is necessary to check for unhandled
exceptions with the *checkExceptionalStatus()* guardian member method. This
method checks for external exceptions that was raised by other participants,
but that has an influence in the behavior of this participant. If there is
some exception to be handle, than the *checkExceptionalStatus()* raises the
exception; otherwise the method returns.
Every time a participant wants to signal an external exception, it uses the
*gthrow()* method from its respective guardian member. The messages
exchanged between the participants, guardian members, and guardian group
when the gthrow is invoked is depicted in the sequence diagram [3]. (See the
"Progress on the GSoC project: Supporting Concurrent Exception Handling at
Tuscany SCA" conversation thread for more details).
*# Recovery Rules XML File*
When a participant invokes *gthrow() *to signal an external exception to a
set of participants, the guardian group calls the recovery rules, defined by
the user, to find out which exception should be raised in each participant
present in the list, as well as, the proper target context (in other words,
the place where the exception will be raised and treated).
A piece of the recovery rules XML file for the discussed scenario is (see
the full file at [4]):
<recovery_rules>
<!-- A new participant joins in the group -->
<rule name="Rule1"
signaled_exception="org.apache.tuscany.sca.guardian.JoinException">
<participant match="*.PRIMARY">
<throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupJoinedException"
target_context="PRIMARY"/>
</participant>
<participant match="SIGNALER">
<throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryExistsException"
target_context="MAIN" min_participant_joined="2"/>
</participant>
</rule>
...
</recovery_rules>
When a participant joins in the guardian group, the guardian raises a
JoinException indicating that a new participant has joined. The defined
recovery rule "Rule1", is applied when such exception is found. Then, the
guardian adds a BackupJoinedException, with target context equals to
PRIMARY, to all active participants that are in the "*.PRIMARY" context
(MAIN.PRIMARY fills this rule), and a PrimaryExistsException, with target
context equals to MAIN, in the participant that has raised the external
exception (in other words, the SIGNALER), if there are at least two
participants that have already joined in the guardian group.
"Rule 2" is applied when a participant raise a PrimaryFailedException. Such
exception means that an internal error has occurred in the participant that
has the PRIMARY context activate.
<rule name="Rule2"
signaled_exception="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException">
<participant match="*.PRIMARY">
<throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
target_context="INIT_CONTEXT"/>
</participant>
<participant match="*.BACKUP">
<throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
target_context="MAIN">
<affected_participants>FIRST</affected_participants>
</throw_exception>
</participant>
</rule>
The guardian adds a PrimaryFailedException, with target context equals
INIT_CONTEXT, to the participant that is in the PRIMARY context. The
INIT_CONTEXT is the most outside context, and it comes before the other
contexts defined by the user. In this application, the INIT_CONTEXT is the
place where NodeImpl.execute() is invoked. For this application, raising an
exception in this context, means that the participant has failed.
For the first backup in the list of backups, a PrimaryFailedException is
added with the target context equals MAIN.
The "Rule 3" works like the "Rule 2", but it is applied for a
BackupFailedException:
<!-- The Backup fails -->
<rule name="Rule3"
signaled_exception="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException">
<participant match="*.PRIMARY">
<throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"
target_context="PRIMARY"/>
</participant>
<participant match="SIGNALER">
<throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"
target_context="INIT_CONTEXT"/>
</participant>
</rule>
*# Putting the pieces together...*
Summarizing, the application works like that:
1. A participant 'A' joins in the guardian group with the MAIN context
activate; a JoinException is signaled by the guardian; no exceptions are
delivered to the participant; and the participant reaches the PRIMARY
context.
2. A new participant 'B' joins in the guardian group with the MAIN
context activate; a JoinException is signaled by the guardian; the guardian
executes the recovery rule "Rule1"; a BackupJoinedException, with target
context equals PRIMARY, is delivered to the participant A; a
PrimaryExistsException, with target context equals MAIN, is delivered to the
participant B.
3. When the participant A invokes *checkExceptionalStatus()* the
BackupJoinedException is raised in it, and it starts to send updates to the
backup.
4. When the participant B invokes *checkExceptinalStatus()* the
PrimaryExistsException is raised in it, and it becomes a backup.
After that, the primary send updates to the backups, and the backups apply
the updates received from the primary.
If an internal error occurs in the primary, we have:
1. The participant 'A' fails, so a PrimaryFailedException is signaled to
the guardian.
2. The guardian executes the recovery rule "Rule2".
3. The guardian adds a PrimaryFailedException, with target context equals
INIT_CONTEXT, to the participant 'A'.
4. The guardian adds a PrimaryFailedException, with target context equals
MAIN, to the first backup in the backup list (in this case, the participant
'B')
5. When the participant 'A' invokes *checkExceptionalStatus()* the
PrimaryFailedException is raised in it, and propagated until the init
context, what causes the stop of this participant.
6. When the participant 'B' invokes *checkExceptionalStatus() *the
PrimaryFailedException is raised in it, and the participant becomes the
primary.
If an internal error occurs in the backup, we have:
1. The participant 'B' fails, so a BackupFailedException is signaled to
the guardian.
2. The guardian executes the recovery rule "Rule3".
3. The guardian adds a BackupFailedException, with target context equals
PRIMARY, to the participant 'A'.
4. The guardian adds a BackupFailedException, with target context equals
INIT_CONTEXT, to participant 'B'.
5. When the participant 'A' invokes *checkExceptionalStatus()* , the
BackupFailedException is raised in it, and it removes the participant 'B'
from its backup list.
6. When the participant 'B', invokes *checkExceptionalStatus()* the
BackupFailedException is raised in it, and propagated until the init
context, what causes the stop of this participant.
*# Concurrent Exceptions and the Resolution Tree*
Due to the fact that the gthrow executes asynchronously, concurrent
exceptions can occur.
When concurrent exceptions occur, the guardian searches, in a resolution
tree, for the lowest common ancestor between the concurrent exceptions, and
then apply the recovery rules for this resolved exception. If there isn“t a
lowest common ancestor, than the guardian apply the recovery rules for each
exception sequentially.
The resolution tree for the discussed scenario is:
<resolution_trees>
<resolution_tree exception_level="1">
<exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryBackupFailedTogetherException">
<exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"/>
<exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"/>
</exception>
</resolution_tree>
</resolution_trees>
In this way, when a primary and a backup fail together, the
PrimaryFailedException and BackupFailedException will be concurrent, and the
resolved exception will be the PrimaryFailedBackupTogetherException.
The recovery rule "Rule4" works when such kind of exception is signaled:
<!-- The Primary and Backup fail together -->
<rule name="Rule4"
signaled_exception="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryBackupFailedTogetherException">
<participant match="*.PRIMARY">
<throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
target_context="INIT_CONTEXT"/>
</participant>
<!-- Backup signaler -->
<participant match="*.BACKUP,SIGNALER">
<throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.BackupFailedException"
target_context="INIT_CONTEXT"/>
</participant>
<!-- Excluding the backup signaler -->
<participant match="*.BACKUP,!SIGNALER">
<throw_exception
class="org.apache.tuscany.sca.guardian.itests.primaryBackup.common.PrimaryFailedException"
target_context="MAIN">
<affected_participants>FIRST</affected_participants>
</throw_exception>
</participant>
</rule>
The guardian adds a PrimaryFailedException, with target context
INIT_CONTEXT, to the participant that is in the PRIMARY context. Similarly,
the guardian adds a BackupFailedException, with target context INIT_CONTEXT,
to the participant that is in the BACKUP context, and has signaled the
external exception BackupFailedException. A PrimaryFailedException, with
target context MAIN, is added to the first backup in the backup list that
has not signaled any exception.
This action causes the end of execution of the participants that have
failed, and choose a new backup to become the new primary server.
*# Ideas to improve the model implementation*
Although the implementation is working, I think that some modifications
could be done in order to approximate more the model to the tuscany sca.
1. As was suggested previously, I think that could be a good idea uses
the recovery rules and the resolution tree as policies, instead of
properties in the guardian component.
2. Instead of using the org.apache.tuscany.sca.guardian.GuardianImpl as
the class of a implementation.java component, maybe would be better define a
new implementation type, like implementation.guardian, that has the
org.apache.tuscany.sca.guardian.Guardian as its service interface, and
allows recovery-rules and resolution-tree as policies.
3. Since we know all participants need to have a guardian member
associated with it, the guardian members could be created automatically by
the runtime. In this way, the user only need to define a component of type
implementation.guardian, and has a reference to it, and in the background,
the runtime creates one guardian member to each participant, and do the
proper bindings between the components.
That's all for now. Let me know what you think. If you need some more
explanation, ask me. :)
*# Links*
[1]
http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/src/test/java/org/apache/tuscany/sca/guardian/itests/primaryBackup/common/NodeImpl.java
[2]
http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/src/main/resources/primaryNbackups-concurrent.composite
[3]
http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/sequenceDiagram-externalException.jpg
[4]
http://svn.apache.org/repos/asf/tuscany/sandbox/dougsleite/guardian-model/src/main/resources/recoveryrules_nbackpus_concurrent.xml
--
Douglas Siqueira Leite
Graduate student at University of Campinas (Unicamp), Brazil