Re: [infinispan-dev] xsite state transfer design

Erik Salter Thu, 04 Oct 2012 13:39:20 -0700

Hi all,


Per the wiki and a conversation with Mircea/Bela, I've attempted to capture
a more detailed view into the design.  

 

If it's useful, please feel free to use it.

 

Thanks,

 

Erik

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Mircea Markus
Sent: Tuesday, October 02, 2012 12:58 PM
To: infinispan
Subject: [infinispan-dev] xsite state transfer design

 

Hi,

 

After discussing with Dan and Adrian, I've updated the xsite state transfer
document with a suggested design[1].

Note that this is out of scope for 5.2. Any review is very welcomed.

 

[1]
https://community.jboss.org/wiki/DesignForCrossSiteReplication#State_Transfe
r_between_sites 

 

 

Cheers,

-- 
Mircea Markus

Infinispan lead (www.infinispan.org)

Inter-site state transfer is the mechanism in which a running site will push
its state to another site. This currently is a manual process, applicable in
the cases where a data center is being brought online or recovering a failed
data center. It follows the same design principals of non-blocking state
transfer. It is not enough to simply iterate over the keyset and apply the
current values, since they may have been modified or deleted during normal
operation.

JMX OPERATIONS
----------

Per cache manager:

1. pushState( String siteName ): This invokes, on a per-cache manager basis,
the inter-site state transfer method. This will use the configured backup
policy of the cluster.
2. pushState( String siteName, String cacheName ): Similar to the above, but
invokes it only using the policy of the defined cache.
2. pushState( String siteName, String cacheName, int chunkSite ): Similar to
the above, with the number of keys to transmit at once
3. stop( String siteName ): This aborts any state transfer operation.
4. stop( String siteName, String cacheName ):
5. KeyState getStatus( String siteName, String cacheName ): This returns the
number of keys that have been successfully pushed to the remote site vs. the
number of keys to push.

COMPONENTS:
----------

There are two main components of inter-site state transfer. There is the
XSiteStateProvider on the pushing "site" side, and there is the
XSiteStateConsumer on the remote node. Note that these are logical components,
and their implementation may not follow the same pattern.

XSiteStateProducer
- OutboundTransferTask
XStiteStateConsumer
- remote2LocalTx map
- removedKey map
- Running/not running state management (may be more than one variable)

COMMANDS
---------

- XSiteStateTransferRequestCommand (producer --> consumer): This is invoked on
a per-cache manager basis that alerts the receiving site to incoming requests
and sets internal state accordingly
- XSiteStateRequestCommand (producer-->consumer): Invoked to transmit current
tx information to the remote site.
- XSiteFinalizeStateTransferCommand (producer --> consumer ): This is invoked
when the state provider has generated all the state necessary for transmitting
the data.

Note that the above are issued on a per-cache manager basis. The receiving
cluster does not know about which caches will be transmitted, since teh
XSiteBackup impl tracks them on tx boundaries anyway. It is the responsibility
of the Producer to aggregate all status before issuing the finalize command.

DESIGN OVERVIEW
---------------
Conceptually, inter-site state transfer uses the NBST design pattern for
producing and consuming state (and all its related semantics), while leveraging
the XSite implementation for replaying modifications on the receiving site.

To start the inter-site state transfer mechanism, a system administrator will
connect to any data grid node in the reference site and issue a JMX command
(admin console, JMX). It is important to note that the state transfer may be
invoked from any node on a local cluster.

After invoking this command, the XSiteStateProvider sends a the
XSiteRequestCommand to the SiteMaster of the remote site: .When the
XSiteStateProvider on the remote site receives this message, it sets up its
internal state and replies. The local site can start pushing state.

The XSiteStateConsumer will calculate the set of nodes that can generate the
state. These nodes will be own the primary segments for a set of keys. This
is a subset of the total nodes in the local cluster. A great deal of the state
provider implementation is already complete. This can leverage the following
class for the necessary functionality:

https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/statetransfer/OutboundTransferTask.java#L217

During this operation, the current transaction state (the current "prepares")
from the ORIGINATING site are pushed first (similar to the NBST method) in a
XSiteStateRequest command. The XSiteStateConsumer will process these tx
information and place these commands into a local map. The key is the
GlobalTransaction as generated by the pushing site. The object is a composite
of the LocalTransaction and the current status:

- prepare_received:
- commit_received:
- committed

i.e. ConcurrentMap< GlobalTransaction, CompositeTxStatus > remote2LocalTx map.

For each state, the behavior is as following:

- prepare_received, committed: Ignore this message, as it was already handled.
- commit_received: We grab the local transaction and commit it, and set the
state to "committed"
- No entry: Start a local transaction and add it to the map with the status
"prepare_received"

AFTER the transaction state is pushed, the XSiteStateConsumer will ask for the
rest of the state.

The state is applied to the remote site as a putIfAbsent. Note that
putIfAbsent fails if a write has already happened across the cluster, thereby
guaranteeing the most current copy is applied to the backup.

A note about deletes:

We are waiting for tombstoning functionality (ISPN-2362), but it is not
available yet. So we have an alternate implementation:

If the remote state coordinator intercepts commands (or receives commands of
some sort), we know when there's been a delete processed. So we can keep some
sort of state in addition to the map above.
This means if the processing key from the state transfer has already been
deleted by a separate write operation, we can simply skip over that key.

Once all state has been sent for all backed-up caches, the XSiteStateProducer
will send a new command to finalize the state transfer operations:
XSiteFinalizeStateTransferCommand

This may involve some of the following:

- Removes all the transactions marked as "committed".
- The XSiteStateConsumer must ONLY handle CommitCommands that are still kept
in the remote2localTx map to in order to finish them up. This can mean
committing all local transactions, or rolling them back in the case no messages
are received during
- Clean up the maps. (???)

TBD: Can this be leveraged from the StaleTransactionCleanupService?

_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] xsite state transfer design

Reply via email to