Re: Issue 940 in mobicents: Electors covering different cache configuration + covering different conf in listener

mobicents Mon, 14 Dec 2009 09:37:16 -0800

Updates:
        Labels: -Version-2.0.00 -Release-Type-Candidate Version-2.1.00

Comment #7 on issue 940 by EMMartins: Electors covering different cacheconfiguration + covering different conf in listener

http://code.google.com/p/mobicents/issues/detail?id=940

the buddy group config still has a lot of grey zones, with unexpected or not

logically understandable jboss cache behavior, its inclusion on mobicentsjain slee

2.x is than  postponed to version 2.1.

Here is a log of the current state:

[16:20] <baranowb> lets consider two scenarios
[16:20] <baranowb> 1. no gravitation

[16:21] <baranowb> in this scenario each cache node acts as standalone, sofrom point

of cache user there is no HA, only FT - nodes dont see data of other

[16:21] <baranowb> Ive told YOu about it, thought there is a way toovercome this.

[16:22] <martins> how can you have ft
[16:22] <martins> if nothin is replicated
[16:22] <baranowb> no,
[16:22] <baranowb> see
[16:22] <baranowb> lets say we have two nodes
[16:22] <baranowb> N_1 and N_2
[16:23] <baranowb> each has
[16:23] <baranowb> data
[16:23] <baranowb> in /ac,/timers, bla bla bal
[16:23] <baranowb> right?
[16:23] <martins> y
[16:24] <baranowb> but cahce structure looks like
[16:24] <baranowb> damn
[16:24] <baranowb> _/ac
[16:24] <baranowb> _/timers
[16:24] <baranowb> _/_BUDDY_BACKUP_/N_#/ac
[16:24] <baranowb> _/_BUDDY_BACKUP_/N_#/timers
[16:24] <baranowb> now on N_1 we dont see part thats in _BACKUP_

[16:25] <baranowb> that is, we dont see N_2 data, the same applies to viewfrom N_2

[16:25] <baranowb> following ?
[16:25] <martins> y
[16:25] <baranowb> ok
[16:25] <baranowb> what happens on N_1 failure is that
[16:26] <baranowb> N_2 takes ownership of this data in  _/_BUDDY_BACKUP_/N_1
[16:26] <baranowb> and now its visible
[16:26] <martins> right
[16:26] <baranowb> and everythign is recreated and works
[16:27] <martins> ok, so why you don't have HA
[16:28] <baranowb> ok, consider, that with cluster wide replciation
[16:28] <baranowb> if we remove something from _/ac
[16:28] <baranowb> it is removed from each node

[16:29] <baranowb> with case above each node is considered as data owner,and only

owning node can remove
[16:30] <martins> correct
[16:30] <baranowb> "can" - its not entirely true
[16:30] <baranowb> so see
[16:30] <baranowb> N_1 creates service, acs, timers
[16:30] <baranowb> now we fire on some ac in other container
[16:31] <baranowb> ac is not in _/ac
[16:31] <martins> ok, I think I understand your misunderstanding :)
[16:31] <baranowb> so we would have to lookup _BACKUP_ and move it to _/
[16:31] <baranowb> lol
[16:31] <martins> that is not "no HA"
[16:31] <baranowb> so what did I mis :)
[16:31] <martins> you have HA
[16:32] <martins> because HA is simply load balancing
[16:32] <baranowb> ach, ok, I was thinkiing more of "create on 1, fire to 2"
[16:32] <martins> what you don't have is support for "loose" load balancing
[16:32] <martins> you need afinity
[16:32] <baranowb> ok, thats ql, and this will work

[16:33] <baranowb> so if this is what we want, than lets move to secondscenario Ive

been trying to make work

[16:33] <martins> but that was expected right from the beggining, that iswhy there

is also gravitation option
[16:33] <martins> right? :)
[16:34] <baranowb> y, had something else in mind by HA term than :)
[16:34] <baranowb> ok about case #2

[16:34] <martins> ok, so that one works as you explain, it just requiresbalanicng

affinity to support fail over?
[16:34] <baranowb> y
[16:34] <martins> ok
[16:34] <baranowb> and everything works like a charm
[16:34] <martins> now with gravitation
[16:34] <baranowb> ok, with gravitation it sucks :)
[16:35] <baranowb> it sucks at point where new buddy joins
[16:35] <baranowb> data ownwership gets screwed
[16:35] <baranowb> totaly
[16:35] <baranowb> consider scenario
[16:35] <baranowb> N_1,N_2, N_3
[16:35] <baranowb> N_1, fails, N_2, takes over, as expected,

[16:36] <baranowb> ok, at this point, N_2 is owner, has runnign timers,what ever[16:37] <baranowb> we can still fire on N_3, and it works, N_3 just hassome entries

for new acs _/ac's
[16:37] <baranowb> this is quite good imho
[16:37] <baranowb> but now consider taht N_1 rejoins
[16:37] <baranowb> now everything gets screwed due to gravitation

[16:38] <baranowb> N_1 cache takes over all data from buddy group, I meantotaly

everything
[16:38] <baranowb> no matter how I set overrides, invoce CacheData init
[16:38] <martins> so it gets its data back
[16:38] <martins> ?
[16:38] <baranowb> it takes everything
[16:38] <martins> what is everything
[16:39] <baranowb> everything that is in buddy group
[16:39] <baranowb> so it gets data from N_2 and N_3
[16:39] <baranowb> leaving their cache _/ empty
[16:39] <martins> wow
[16:40] <martins> that sounds like a bad config or a bug

[16:40] <baranowb> posted on jbc about it(not sure why timestamp shows 2days ago,

when forums did not work)
[16:40] <baranowb> http://community.jboss.org/thread/85420

[16:41] <baranowb> 1.st there is some page in wiki which says that withgravitation

cache shoudl have structure like
[16:41] <baranowb> _/node_address/ac
[16:41] <baranowb> foubnd it by accident, dunno why its not in user guide
[16:41] <baranowb> 2. nd thing, if cache is set to local mode ||
overrideoption.setSkipDataGravitation(true)
[16:41] <baranowb> it should not happen

[16:42] <baranowb> but somewhere in debug log I see that overrideoption arecleaned

and everything isfetched
[16:43] <martins> 1. useless
[16:43] <baranowb> y, we should be able to work with override options
[16:44] <martins> I can't quickly understand your post

[16:44] <baranowb> and its the same thing as having data in backup withoutgravitation

[16:44] <martins> you are complaining that once N2 starts N1 gets its data ?
[16:44] <martins> in /BACKUP_...
[16:44] <martins> ?

[16:45] <baranowb> hmm, maybe I should rephrase it than, its the samescenario

[16:45] <martins> you need to provide a bit more simple use cases

[16:45] <baranowb> when another node starts and joins, it gets whole dataof buddy group

[16:45] <baranowb> that is
[16:45] <martins> lets say every NODE writes a /ac { x = ip_address }
[16:45] <martins> just data
[16:45] <martins> just that
[16:45] <baranowb> ok
[16:45] <baranowb> so
[16:46] <baranowb> N_1 has
[16:46] <baranowb> (lets consider fqns, not data)
[16:46] <baranowb> N_1
[16:46] <martins> ok then ac/ip_address
[16:46] <baranowb> _/ac/N1_IP
[16:46] <martins> is what each writes
[16:46] <martins> right
[16:47] <baranowb> _/_BUDDY_BACKUP_/_N2_/ac/N_2_IP
[16:47] <baranowb> _/_BUDDY_BACKUP_/_N3_/ac/N_3_IP
[16:47] <baranowb> N_2
[16:47] <baranowb>  _/ac/N2_IP
[16:47] <baranowb> _/_BUDDY_BACKUP_/_N1_/ac/N_1_IP
[16:47] <baranowb> _/_BUDDY_BACKUP_/_N3_/ac/N_3_IP
[16:47] <baranowb> similar N_@
[16:48] <baranowb> 3
[16:48] <baranowb> right ?
[16:48] <martins> 2 backup nodes of N3 ?
[16:48] <baranowb> N2 and N1
[16:49] <baranowb> N_3
[16:49] <baranowb> _/ac/N3_IP
[16:49] <baranowb> _/_BUDDY_BACKUP_/_N1_/ac/N_1_IP

[16:49] <martins> that is a bit off topic, but in a cluster with N = 3 eachnode has

2 backups?
[16:49] <baranowb> _/_BUDDY_BACKUP_/_N2_/ac/N_2_IP
[16:49] <baranowb> its configurable, depends on nmber of buddies
[16:49] <martins> ok
[16:49] <baranowb> You can have 1, 2,3....
[16:50] <baranowb> so N_1 dies
[16:50] <baranowb> N_2 is elected as owner
[16:50] <baranowb> so data looks like
[16:50] <baranowb> _/ac/N2_IP
[16:50] <baranowb> _/ac/N1_IP
[16:50] <baranowb> _/_BUDDY_BACKUP_/_N3_/ac/N_3_IP
[16:50] <baranowb> N_3 data
[16:50] <baranowb> _/ac/N3_IP
[16:50] <baranowb> _/_BUDDY_BACKUP_/_N2_/ac/N_2_IP
[16:50] <baranowb> _/_BUDDY_BACKUP_/_N2_/ac/N_1_IP
[16:51] <baranowb> right?
[16:51] <martins> y
[16:51] <baranowb> ok,. now N_1 starts, and after start data looks like
[16:51] <baranowb> N_1 data
[16:51] <baranowb> _/ac/N2_IP
[16:51] <baranowb> _/ac/N1_IP
[16:51] <baranowb> _/ac/N3_IP
[16:51] <baranowb> no backup
[16:52] <baranowb> on N2, or N3
[16:52] <baranowb> _/ac/
[16:52] <baranowb> _/_BUDDY_BACKUP_/_N1_/ac/N_1_IP
[16:52] <baranowb> _/_BUDDY_BACKUP_/_N1_/ac/N_2_IP
[16:52] <baranowb> _/_BUDDY_BACKUP_/_N1_/ac/N_3_IP

[16:54] <martins> that is the normal behavior you get with default buddygroup +

datagravitation on config ?

[16:55] <baranowb> y, this is somewhat default way, I mean if there isoverride used

[16:55] <baranowb> to control gravitation
[16:55] <martins> lets stick to default one first
[16:56] <baranowb> so first scenario is goal  for now ?
[16:56] <martins> what is important to know
[16:56] <martins> is if this behavior is expected
[16:56] <martins> if it is not who is fault

[16:56] <martins> if it is what is the reason, since it doesn't seemlogical at first[16:57] <martins> is there any prob you get by using this default buddygroups with

data gravitation ?

[16:57] <baranowb> y, I will add simpler explanation to post and @ manikdirectly

[16:57] <baranowb> wdym?

[16:57] <martins> I will ask him and add you to cc, I have something else Ineed to

clarify with him

[16:58] <martins> I mean, is there any issue because when N1 rejoins itsucks all data ?

[16:58] <baranowb> y
[16:58] <baranowb> see, on failure we look through _BACKUP_
[16:58] <baranowb> and reinit local resources
[16:59] <baranowb> (this happens on wining node)
[16:59] <baranowb> now, if N_2 fails (real owner before N_1 sucks)

[16:59] <martins> why we do that, doesn't jboss cache move the data on itsown

[16:59] <martins> ?

[17:00] <baranowb> 1. there is bug :), it does not happen always, itsbecause on3.1.0 there is that bug whcih causes jbc not to fire BG events(hence itdoes not

update internals and move data)
[17:00] <baranowb> Ive told about it and it seems fixed in 3.2.1

[17:00] <baranowb> our listener handles everything, basically it does whatcache should

[17:00] <martins> this also happens without data gravitation
[17:00] <martins> ?
[17:01] <baranowb> y
[17:01] <baranowb> its buddy group membership logic
[17:01] <martins> hmm I don't like that
[17:01] <martins> maybe we should move this to post ga

[17:01] <baranowb> 2. its more efficient to perform it only localy, nonetwork traffic[17:02] <martins> if jbc internals are not working correctly with currentAS version

[17:02] <baranowb> and we iterate only data that needs to be inspected
[17:02] <martins> for budy groups
[17:02] <martins> it's a  big reason to skip it
[17:02] <baranowb> afaik its the only thing, atleast that Ive noticed
[17:03] <baranowb> well one thing is
[17:03] <martins> do you thin that this solution going over what jbc does
[17:04] <martins> can be significantly better than cluster wide
[17:04] <martins> for this first version ?

[17:04] <baranowb> our code has one advantage - only one node is owner ofdata

[17:04] <martins> I mean, do you think it is worth
[17:04] <baranowb> in jbc impl, data is copied to _/ of each node
[17:04] <baranowb> and imho this is not what we want
[17:05] <baranowb> data owner is consistent with election policy
[17:05] <baranowb> see
[17:05] <baranowb> in jbc impl
[17:05] <baranowb> on buddy failure, each node moved _/BACKUP_/N/
[17:06] <baranowb> to its _/

[17:06] <baranowb> so if node fails, and there are two buddies, each hascopy of

failed buddy at its root after failure
[17:07] <martins> that sounds like another bug

[17:07] <baranowb> and iirc impl of cluster we have expects only electednode to have

it, right?

[17:08] <martins> well, does jbc updates all nodes when one changes thatdata ?

[17:08] <baranowb> hmm, dont think so, did not see this happen

[17:09] <martins> I mean, if oen takes ownership in mobicents side, such asrestoringa timer, and when the tiemr fires it deletes its data, does jbc deletes thedata from

all nodes where it restored the data?
[17:09] <martins> that is a huge leak
[17:09] <martins> if it doesn't delete in all

[17:10] <baranowb> hehe, dont know, cause I assumed in impl that othersshould not

get it, so only one node has data :)
[17:10] <baranowb> possibly gravitation removes it, but cant say for sure

[17:11] <martins> well, must do the same when a node gets data throughgraviation[17:11] <baranowb> since its direct call, it should gravitate removeoperation

[17:11] <martins> when it deletes must delete from all

[17:13] <martins> I'm not getting into this buddy group thing with thatmuch "grey" zones

[17:14] <martins> in that N1,N2,N3
[17:14] <martins> when N1 rejoins

[17:15] <martins> does JBC really copy all to N1 and deletes data from N2and N3, or

is it our code doing the N2 and N3 deletes?

[17:15] <baranowb> its jbc, Ive tried it - I removed our calls to data tosee what is

going on
[17:15] <baranowb> besides there is jbc log
[17:16] <baranowb> about gravitation
[17:16] <martins> does it invoke any callback in N2 and N3when it does  that
[17:16] <baranowb> #

[17:16] <baranowb> 16:04:08,234 TRACE [MVCCNodeHelper] Node /ac is not incontext,

fetching from container.
[17:16] <baranowb> #

[17:16] <baranowb> 16:04:08,234 TRACE [DataGravitatorInterceptor] Checkinglocal

existence of requested fqn /ac
[17:16] <baranowb> #

[17:16] <baranowb> 16:04:08,234 TRACE [DataGravitatorInterceptor]Gravitating from

local backup tree
[17:17] <baranowb> #
[17:17] <baranowb> 16:04:08,234 TRACE [CallInterceptor] Executing command:
GravitateDataCommand{fqn=/ac, searchSubtrees=true}.
[17:18] <baranowb> #

[17:18] <baranowb> 16:04:08,265 TRACE [InvocationContextInterceptor]Invoked withcommand EvictCommand{fqn=/_BUDDY_BACKUP_/127.0.0.1_3273/ac, recursive=true}andInvocationContext [InvocationContext{transaction=TransactionImple < ac,BasicAction:

-560196f5:ce0:4b225f4b:63 status: ActionStatus.RUNNING >,
globalTransaction=GlobalTransaction:<127.0.0.1:3306>:0,
transactionContext=TransactionEntry
[17:18] <baranowb> #

[17:18] <baranowb> modificationList: null,optionOverrides=Option{failSilently=false,

cacheModeLocal=false, dataVersion=null, suppressLocking=false,

lockAcquisitionTimeout=-1, forceDataGravitation=false,skipDataGravitation=false,

forceAsynchronous=false, forceSynchronous=false, suppressPersistence=false,

suppressEventNotification=false}, originLocal=true,bypassUnmarshalling=false}]

[17:18] <baranowb> #

[17:18] <baranowb> 16:04:08,265 TRACE [InvocationContextInterceptor]Setting up

transactional context.
[17:18] <baranowb> #

[17:18] <baranowb> 16:04:08,265 TRACE [InvocationContextInterceptor]Setting tx as

TransactionImple < ac, BasicAction: -560196f5:ce0:4b225f4b:63 status:
ActionStatus.RUNNING > and gtx as GlobalTransaction:<127.0.0.1:3306>:0
[17:18] <baranowb> #

[17:18] <baranowb> 16:04:08,265 TRACE [TxInterceptor] local transactionexists -

registering global tx if not present for Thread[main,5,jboss]
[17:18] <baranowb> #

[17:18] <baranowb> 16:04:08,265 TRACE [TxInterceptor] Associated gtx intxTable is

GlobalTransaction:<127.0.0.1:3306>:0
[17:18] <baranowb> #

[17:18] <baranowb> 16:04:08,265 TRACE [TxInterceptor] TransactionTransactionImple <ac, BasicAction: -560196f5:ce0:4b225f4b:63 status: ActionStatus.RUNNING >is already

registered.
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=.....
[17:18] <baranowb> #

[17:18] <baranowb> 16:04:08,265 TRACE [MVCCNodeHelper] Retrieving wrappednode

/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=.....
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=..../...
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=.../.../...
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=.../...
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=/../.../...
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=...
[17:18] <baranowb> #

[17:18] <baranowb> 16:04:08,265 TRACE [MVCCNodeHelper] Retrieving wrappednode

/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=SERVICE>>ServiceID[name=TimerExampleService,vendor=org.mobicents,version=1.0]
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [CallInterceptor] Executing command:
EvictCommand{fqn=/_BUDDY_BACKUP_/127.0.0.1_3273/ac, recursive=true}.
[17:18] <baranowb> bla bla bla and so on
[17:19] <baranowb> I dont have log for other nodes
[17:19] <martins> which node is that
[17:19] <baranowb> N_1
[17:19] <martins> what is 3273
[17:19] <martins> N3?
[17:19] <baranowb> port
[17:19] <baranowb> y, buddies are stored as
[17:19] <baranowb> IP_PO
[17:19] <baranowb> PORT
[17:19] <martins> I know it is port :p
[17:19] <baranowb> :)
[17:20] <baranowb> its N3 I think
[17:20] <martins> that is on rejoin ?
[17:20] <baranowb> y
[17:21] <martins> you are fetching data from N3
[17:21] <martins> and it has the data on back tree
[17:21] <martins> what is wrong there
[17:21] <baranowb> its local call, and gravitation should be supresed
[17:22] <baranowb> for this case
[17:22] <martins> why is it local
[17:22] <baranowb> cause we should not get /ac
[17:22] <baranowb> from all other nodes
[17:23] <martins> you need to get AC to get a child
[17:23] <baranowb> well thats not entirely true - see
[17:23] <baranowb> when all nodes are running
[17:23] <baranowb> one dies
[17:23] <baranowb> N2 takes over
[17:23] <baranowb> we can create on N3 ubnder /ac

[17:24] <baranowb> nothing happens, no data sucking, no nothing, justcreating

something under /ac
[17:24] <martins> that is because ac already exists?
[17:25] <baranowb> y
[17:25] <baranowb> atleast I suspect thats the cause
[17:25] <baranowb> there is /ac in local cache instance

[17:26] <martins> baranowb, I think there are much to uncover and work onthis jbc config

[17:27] <martins> I feel it is far from a stable usable config for our HA
[17:27] <martins> do you agree?
[17:27] <baranowb> with gravitation i agree,
[17:28] <baranowb> first case could be of use, it seems to work ok
[17:28] <baranowb> and result is deterministic
[17:29] <martins> I'm not sure

[17:29] <baranowb> but if there is no urge, we can leave it, I still havefew valid fixes[17:29] <martins> it looks like a use case that may be more toruble makingthan life

savior
[17:29] <martins> would be preferable to split a cluster in smaller clusters
[17:30] <baranowb> ok, I agree, we can freeze this issue with patch
[17:30] <baranowb> yes, exactly

[17:30] <martins> don't freeze, we need to continue working on this inparallel

[17:30] <baranowb> freeze for this release I mean :)
[17:31] <martins> till we have our doubts all answered

[17:31] <martins> the buddy group + gravitation has the potentical tobecome the

default setup
[17:31] <martins> potential
[17:31] <martins> and thus we should make it a big priority in HA
[17:32] <martins> well, we should point it for what kind of release, 2.1 ?
[17:33] <baranowb> y


--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

Re: Issue 940 in mobicents: Electors covering different cache configuration + covering different conf in listener

Reply via email to