Hi all,

 

Per the wiki and a conversation with Mircea/Bela, I've attempted to capture
a more detailed view into the design.  

 

If it's useful, please feel free to use it.

 

Thanks,

 

Erik

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Mircea Markus
Sent: Tuesday, October 02, 2012 12:58 PM
To: infinispan
Subject: [infinispan-dev] xsite state transfer design

 

Hi,

 

After discussing with Dan and Adrian, I've updated the xsite state transfer
document with a suggested design[1].

Note that this is out of scope for 5.2. Any review is very welcomed.

 

[1]
https://community.jboss.org/wiki/DesignForCrossSiteReplication#State_Transfe
r_between_sites 

 

 

Cheers,

-- 
Mircea Markus

Infinispan lead (www.infinispan.org)

 

 

 

Inter-site state transfer is the mechanism in which a running site will push 
its state to another site.  This currently is a manual process, applicable in 
the cases where a data center is being brought online or recovering a failed 
data center.  It follows the same design principals of non-blocking state 
transfer.  It is not enough to simply iterate over the keyset and apply the 
current values, since they may have been modified or deleted during normal 
operation.


JMX OPERATIONS
----------

Per cache manager:

1.  pushState( String siteName ):  This invokes, on a per-cache manager basis, 
the inter-site state transfer method.  This will use the configured backup 
policy of the cluster.
2.  pushState( String siteName, String cacheName ):  Similar to the above, but 
invokes it only using the policy of the defined cache.
2.  pushState( String siteName, String cacheName, int chunkSite ):  Similar to 
the above, with the number of keys to transmit at once
3.  stop( String siteName ):  This aborts any state transfer operation.
4.  stop( String siteName, String cacheName ):
5.  KeyState getStatus( String siteName, String cacheName ):  This returns the 
number of keys that have been successfully pushed to the remote site vs. the 
number of keys to push.

COMPONENTS:
----------

There are two main components of inter-site state transfer. There is the 
XSiteStateProvider on the pushing "site" side, and there is the 
XSiteStateConsumer on the remote node.  Note that these are logical components, 
and their implementation may not follow the same pattern.

XSiteStateProducer
  - OutboundTransferTask
XStiteStateConsumer
   - remote2LocalTx map
   - removedKey map
   - Running/not running state management (may be more than one variable)

COMMANDS
---------

- XSiteStateTransferRequestCommand (producer --> consumer):  This is invoked on 
a per-cache manager basis that alerts the receiving site to incoming requests 
and sets internal state accordingly
- XSiteStateRequestCommand (producer-->consumer):  Invoked to transmit current 
tx information to the remote site.
- XSiteFinalizeStateTransferCommand (producer --> consumer ): This is invoked 
when the state provider has generated all the state necessary for transmitting 
the data.

Note that the above are issued on a per-cache manager basis.  The receiving 
cluster does not know about which caches will be transmitted, since teh 
XSiteBackup impl tracks them on tx boundaries anyway.  It is the responsibility 
of the Producer to aggregate all status before issuing the finalize command.


DESIGN OVERVIEW
---------------
Conceptually, inter-site state transfer uses the NBST design pattern for 
producing and consuming state (and all its related semantics), while leveraging 
the XSite implementation for replaying modifications on the receiving site.

To start the inter-site state transfer mechanism, a system administrator will 
connect to any data grid node in the reference site and issue a JMX command 
(admin console, JMX).  It is important to note that the state transfer may be 
invoked from any node on a local cluster.  

After invoking this command, the XSiteStateProvider sends a the 
XSiteRequestCommand to the SiteMaster of the remote site: .When the 
XSiteStateProvider on the remote site receives this message, it sets up its 
internal state and replies.  The local site can start pushing state.

The XSiteStateConsumer will calculate the set of nodes that can generate the 
state.  These nodes will be own the primary segments for a set of keys.  This 
is a subset of the total nodes in the local cluster.  A great deal of the state 
provider implementation is already complete.  This can leverage the following 
class for the necessary functionality: 

https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/statetransfer/OutboundTransferTask.java#L217

During this operation, the current transaction state (the current "prepares") 
from the ORIGINATING site are pushed first (similar to the NBST method) in a 
XSiteStateRequest command.  The XSiteStateConsumer will process these tx 
information and place these commands into a local map.  The key is the 
GlobalTransaction as generated by the pushing site.  The object is a composite 
of the LocalTransaction and the current status:

- prepare_received:
- commit_received:
- committed

i.e. ConcurrentMap< GlobalTransaction, CompositeTxStatus > remote2LocalTx map.

For each state, the behavior is as following:

- prepare_received, committed:  Ignore this message, as it was already handled.
- commit_received: We grab the local transaction and commit it, and set the 
state to "committed"
- No entry: Start a local transaction and add it to the map with the status 
"prepare_received"

AFTER the transaction state is pushed, the XSiteStateConsumer will ask for the 
rest of the state.

The state is applied to the remote site as a putIfAbsent.  Note that 
putIfAbsent fails if a write has already happened across the cluster, thereby 
guaranteeing the most current copy is applied to the backup.

A note about deletes:

We are waiting for tombstoning functionality (ISPN-2362), but it is not 
available yet.  So we have an alternate implementation:  

If the remote state coordinator intercepts commands (or receives commands of 
some sort), we know when there's been a delete processed.  So we can keep some 
sort of state in addition to the map above.
This means if the processing key from the state transfer has already been 
deleted by a separate write operation, we can simply skip over that key.

Once all state has been sent for all backed-up caches, the XSiteStateProducer 
will send a new command to finalize the state transfer operations: 
XSiteFinalizeStateTransferCommand  

This may involve some of the following:

 - Removes all the transactions marked as "committed". 
 - The XSiteStateConsumer must ONLY handle CommitCommands that are still kept 
in the remote2localTx map to in order to finish them up.  This can mean 
committing all local transactions, or rolling them back in the case no messages 
are received during 
 - Clean up the maps. (???)

TBD:  Can this be leveraged from the StaleTransactionCleanupService?
_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to