Hi,

We have an issue regarding state transfer within our system. We basically 
assume that given two clustered caches A and B, cache A can be used and 
accessed safely while cache B is starting without and risk of data loss when 
cache B is successfully started and accessed. This, however, does not seem to 
be the case all the time.

You will find a test case here, with source and configuration files: 
http://www.cubeia.com/misc/statetransfer/src.zip

The test stresses the state transfer. The test class is documented, but 
inshort, it sets up:

  | * Two caches, but starts only the first.
  | * Fifty objects with a "counter", mapped by id within cache one.
  | * Fifty threads associated with one object each (by id) and cache one.
  | 
The test then goes like this:

  | * The threads are started. Each thread accesses its associated object, 
checks that the counter is correct (ie. can correctly be incremented by one 
without loosing intermediate states), increment the counter and repeats.
  | * Cache two is started.
  | * Half of the threads are re-associated with cache two instead of cache 
one, however, their execution is not halted. 
  | 
The test fails on either JBoss Cache exceptions or 1) sequence errors (ie. lost 
intermediate states); or 2) missing state (ie. attempt to access an object in 
cache two which has not been replicated at all).  We have tested the following 
setup in differend permutations: REPL_SYNC/REPL_ASYNCH, user transaction/no 
user transaction, and buddy replication enabled or disabled. So far our results 
looks somewhat like this (and they do match what we're seeing in our main 
system:

SYNCH + TRANS + NO BUDDY
Fail to replicate to cahce two with a replication exception caused by a 
suspetced member exception. 

ASYNCH + TRANS + NO BUDDY
Sequence errors. Is this expected?

SYNCH + NO TRANS + NO BUDDY
Success.

ASYNCH + NO TRANS + NO BUDDY
Sequence errors. Is this expected? There's also "cache not in started state" 
errors on shutdown.

SYNCH + TRANS + BUDDY
Failure with time out exception. + A subsequent load of exceptions.

ASYNCH + NO TRANS + BUDDY
Sequence errors. Is this expected? Object not found (!). Also, depending on 
whether "loopback" is set in the jgroups stack or not you get slightly 
different behaviour. With loopback=true you get time out exceptions on buddy 
backup nodes. With loopback=false you get "cache not in started state" errors 
on shutdown.

ASYNCH + TRANS + BUDDY
(see above, adding a user transaction does not change the behaviour)

We're primarily interested in getting REPL_ASYNC to work with user transactions 
and buddy replication (however, we're aware that asynchronous replication might 
not work in this scenario so synchronous would be ok). So, a few initial 
questions:

1) Is our understanding of the cache correct (see first paragraph)? If not, 
what prerequisites are needed to safely start a new node in a cluster? 

2) If our understanding is somewhat ok, is the test correct? Obviously I may 
very well have screwed up somewhere in the code :-)

If 1 && 2: Then the test seems to point out, at least, unexpected behaviour.  

Also, I'm aware of that this might be taxing and time consuming questions to 
answer or indeed even verify - even if there's no issue involved - so please 
indicate if there's any support agreement or service which would enable us to 
proceed faster, better or at all :-) You can reach me by PM or emailing me at 
(my name as written below, first name, middle initial and surname in lower case 
without spaces but separated by dots)@cubeia.com.

Cheers -
Lars J. Nilsson 
www.cubeia.com

View the original post : 
http://www.jboss.com/index.html?module=bb&op=viewtopic&p=4149379#4149379

Reply to the post : 
http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=4149379
_______________________________________________
jboss-user mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/jboss-user

Reply via email to