Updates:
Labels: -Version-2.0.00 -Release-Type-Candidate Version-2.1.00
Comment #7 on issue 940 by EMMartins: Electors covering different cache
configuration + covering different conf in listener
http://code.google.com/p/mobicents/issues/detail?id=940
the buddy group config still has a lot of grey zones, with unexpected or not
logically understandable jboss cache behavior, its inclusion on mobicents
jain slee
2.x is than postponed to version 2.1.
Here is a log of the current state:
[16:20] <baranowb> lets consider two scenarios
[16:20] <baranowb> 1. no gravitation
[16:21] <baranowb> in this scenario each cache node acts as standalone, so
from point
of cache user there is no HA, only FT - nodes dont see data of other
[16:21] <baranowb> Ive told YOu about it, thought there is a way to
overcome this.
[16:22] <martins> how can you have ft
[16:22] <martins> if nothin is replicated
[16:22] <baranowb> no,
[16:22] <baranowb> see
[16:22] <baranowb> lets say we have two nodes
[16:22] <baranowb> N_1 and N_2
[16:23] <baranowb> each has
[16:23] <baranowb> data
[16:23] <baranowb> in /ac,/timers, bla bla bal
[16:23] <baranowb> right?
[16:23] <martins> y
[16:24] <baranowb> but cahce structure looks like
[16:24] <baranowb> damn
[16:24] <baranowb> _/ac
[16:24] <baranowb> _/timers
[16:24] <baranowb> _/_BUDDY_BACKUP_/N_#/ac
[16:24] <baranowb> _/_BUDDY_BACKUP_/N_#/timers
[16:24] <baranowb> now on N_1 we dont see part thats in _BACKUP_
[16:25] <baranowb> that is, we dont see N_2 data, the same applies to view
from N_2
[16:25] <baranowb> following ?
[16:25] <martins> y
[16:25] <baranowb> ok
[16:25] <baranowb> what happens on N_1 failure is that
[16:26] <baranowb> N_2 takes ownership of this data in _/_BUDDY_BACKUP_/N_1
[16:26] <baranowb> and now its visible
[16:26] <martins> right
[16:26] <baranowb> and everythign is recreated and works
[16:27] <martins> ok, so why you don't have HA
[16:28] <baranowb> ok, consider, that with cluster wide replciation
[16:28] <baranowb> if we remove something from _/ac
[16:28] <baranowb> it is removed from each node
[16:29] <baranowb> with case above each node is considered as data owner,
and only
owning node can remove
[16:30] <martins> correct
[16:30] <baranowb> "can" - its not entirely true
[16:30] <baranowb> so see
[16:30] <baranowb> N_1 creates service, acs, timers
[16:30] <baranowb> now we fire on some ac in other container
[16:31] <baranowb> ac is not in _/ac
[16:31] <martins> ok, I think I understand your misunderstanding :)
[16:31] <baranowb> so we would have to lookup _BACKUP_ and move it to _/
[16:31] <baranowb> lol
[16:31] <martins> that is not "no HA"
[16:31] <baranowb> so what did I mis :)
[16:31] <martins> you have HA
[16:32] <martins> because HA is simply load balancing
[16:32] <baranowb> ach, ok, I was thinkiing more of "create on 1, fire to 2"
[16:32] <martins> what you don't have is support for "loose" load balancing
[16:32] <martins> you need afinity
[16:32] <baranowb> ok, thats ql, and this will work
[16:33] <baranowb> so if this is what we want, than lets move to second
scenario Ive
been trying to make work
[16:33] <martins> but that was expected right from the beggining, that is
why there
is also gravitation option
[16:33] <martins> right? :)
[16:34] <baranowb> y, had something else in mind by HA term than :)
[16:34] <baranowb> ok about case #2
[16:34] <martins> ok, so that one works as you explain, it just requires
balanicng
affinity to support fail over?
[16:34] <baranowb> y
[16:34] <martins> ok
[16:34] <baranowb> and everything works like a charm
[16:34] <martins> now with gravitation
[16:34] <baranowb> ok, with gravitation it sucks :)
[16:35] <baranowb> it sucks at point where new buddy joins
[16:35] <baranowb> data ownwership gets screwed
[16:35] <baranowb> totaly
[16:35] <baranowb> consider scenario
[16:35] <baranowb> N_1,N_2, N_3
[16:35] <baranowb> N_1, fails, N_2, takes over, as expected,
[16:36] <baranowb> ok, at this point, N_2 is owner, has runnign timers,
what ever
[16:37] <baranowb> we can still fire on N_3, and it works, N_3 just has
some entries
for new acs _/ac's
[16:37] <baranowb> this is quite good imho
[16:37] <baranowb> but now consider taht N_1 rejoins
[16:37] <baranowb> now everything gets screwed due to gravitation
[16:38] <baranowb> N_1 cache takes over all data from buddy group, I mean
totaly
everything
[16:38] <baranowb> no matter how I set overrides, invoce CacheData init
[16:38] <martins> so it gets its data back
[16:38] <martins> ?
[16:38] <baranowb> it takes everything
[16:38] <martins> what is everything
[16:39] <baranowb> everything that is in buddy group
[16:39] <baranowb> so it gets data from N_2 and N_3
[16:39] <baranowb> leaving their cache _/ empty
[16:39] <martins> wow
[16:40] <martins> that sounds like a bad config or a bug
[16:40] <baranowb> posted on jbc about it(not sure why timestamp shows 2
days ago,
when forums did not work)
[16:40] <baranowb> http://community.jboss.org/thread/85420
[16:41] <baranowb> 1.st there is some page in wiki which says that with
gravitation
cache shoudl have structure like
[16:41] <baranowb> _/node_address/ac
[16:41] <baranowb> foubnd it by accident, dunno why its not in user guide
[16:41] <baranowb> 2. nd thing, if cache is set to local mode ||
overrideoption.setSkipDataGravitation(true)
[16:41] <baranowb> it should not happen
[16:42] <baranowb> but somewhere in debug log I see that overrideoption are
cleaned
and everything isfetched
[16:43] <martins> 1. useless
[16:43] <baranowb> y, we should be able to work with override options
[16:44] <martins> I can't quickly understand your post
[16:44] <baranowb> and its the same thing as having data in backup without
gravitation
[16:44] <martins> you are complaining that once N2 starts N1 gets its data ?
[16:44] <martins> in /BACKUP_...
[16:44] <martins> ?
[16:45] <baranowb> hmm, maybe I should rephrase it than, its the same
scenario
[16:45] <martins> you need to provide a bit more simple use cases
[16:45] <baranowb> when another node starts and joins, it gets whole data
of buddy group
[16:45] <baranowb> that is
[16:45] <martins> lets say every NODE writes a /ac { x = ip_address }
[16:45] <martins> just data
[16:45] <martins> just that
[16:45] <baranowb> ok
[16:45] <baranowb> so
[16:46] <baranowb> N_1 has
[16:46] <baranowb> (lets consider fqns, not data)
[16:46] <baranowb> N_1
[16:46] <martins> ok then ac/ip_address
[16:46] <baranowb> _/ac/N1_IP
[16:46] <martins> is what each writes
[16:46] <martins> right
[16:47] <baranowb> _/_BUDDY_BACKUP_/_N2_/ac/N_2_IP
[16:47] <baranowb> _/_BUDDY_BACKUP_/_N3_/ac/N_3_IP
[16:47] <baranowb> N_2
[16:47] <baranowb> _/ac/N2_IP
[16:47] <baranowb> _/_BUDDY_BACKUP_/_N1_/ac/N_1_IP
[16:47] <baranowb> _/_BUDDY_BACKUP_/_N3_/ac/N_3_IP
[16:47] <baranowb> similar N_@
[16:48] <baranowb> 3
[16:48] <baranowb> right ?
[16:48] <martins> 2 backup nodes of N3 ?
[16:48] <baranowb> N2 and N1
[16:49] <baranowb> N_3
[16:49] <baranowb> _/ac/N3_IP
[16:49] <baranowb> _/_BUDDY_BACKUP_/_N1_/ac/N_1_IP
[16:49] <martins> that is a bit off topic, but in a cluster with N = 3 each
node has
2 backups?
[16:49] <baranowb> _/_BUDDY_BACKUP_/_N2_/ac/N_2_IP
[16:49] <baranowb> its configurable, depends on nmber of buddies
[16:49] <martins> ok
[16:49] <baranowb> You can have 1, 2,3....
[16:50] <baranowb> so N_1 dies
[16:50] <baranowb> N_2 is elected as owner
[16:50] <baranowb> so data looks like
[16:50] <baranowb> _/ac/N2_IP
[16:50] <baranowb> _/ac/N1_IP
[16:50] <baranowb> _/_BUDDY_BACKUP_/_N3_/ac/N_3_IP
[16:50] <baranowb> N_3 data
[16:50] <baranowb> _/ac/N3_IP
[16:50] <baranowb> _/_BUDDY_BACKUP_/_N2_/ac/N_2_IP
[16:50] <baranowb> _/_BUDDY_BACKUP_/_N2_/ac/N_1_IP
[16:51] <baranowb> right?
[16:51] <martins> y
[16:51] <baranowb> ok,. now N_1 starts, and after start data looks like
[16:51] <baranowb> N_1 data
[16:51] <baranowb> _/ac/N2_IP
[16:51] <baranowb> _/ac/N1_IP
[16:51] <baranowb> _/ac/N3_IP
[16:51] <baranowb> no backup
[16:52] <baranowb> on N2, or N3
[16:52] <baranowb> _/ac/
[16:52] <baranowb> _/_BUDDY_BACKUP_/_N1_/ac/N_1_IP
[16:52] <baranowb> _/_BUDDY_BACKUP_/_N1_/ac/N_2_IP
[16:52] <baranowb> _/_BUDDY_BACKUP_/_N1_/ac/N_3_IP
[16:54] <martins> that is the normal behavior you get with default buddy
group +
datagravitation on config ?
[16:55] <baranowb> y, this is somewhat default way, I mean if there is
override used
[16:55] <baranowb> to control gravitation
[16:55] <martins> lets stick to default one first
[16:56] <baranowb> so first scenario is goal for now ?
[16:56] <martins> what is important to know
[16:56] <martins> is if this behavior is expected
[16:56] <martins> if it is not who is fault
[16:56] <martins> if it is what is the reason, since it doesn't seem
logical at first
[16:57] <martins> is there any prob you get by using this default buddy
groups with
data gravitation ?
[16:57] <baranowb> y, I will add simpler explanation to post and @ manik
directly
[16:57] <baranowb> wdym?
[16:57] <martins> I will ask him and add you to cc, I have something else I
need to
clarify with him
[16:58] <martins> I mean, is there any issue because when N1 rejoins it
sucks all data ?
[16:58] <baranowb> y
[16:58] <baranowb> see, on failure we look through _BACKUP_
[16:58] <baranowb> and reinit local resources
[16:59] <baranowb> (this happens on wining node)
[16:59] <baranowb> now, if N_2 fails (real owner before N_1 sucks)
[16:59] <martins> why we do that, doesn't jboss cache move the data on its
own
[16:59] <martins> ?
[17:00] <baranowb> 1. there is bug :), it does not happen always, its
because on
3.1.0 there is that bug whcih causes jbc not to fire BG events(hence it
does not
update internals and move data)
[17:00] <baranowb> Ive told about it and it seems fixed in 3.2.1
[17:00] <baranowb> our listener handles everything, basically it does what
cache should
[17:00] <martins> this also happens without data gravitation
[17:00] <martins> ?
[17:01] <baranowb> y
[17:01] <baranowb> its buddy group membership logic
[17:01] <martins> hmm I don't like that
[17:01] <martins> maybe we should move this to post ga
[17:01] <baranowb> 2. its more efficient to perform it only localy, no
network traffic
[17:02] <martins> if jbc internals are not working correctly with current
AS version
[17:02] <baranowb> and we iterate only data that needs to be inspected
[17:02] <martins> for budy groups
[17:02] <martins> it's a big reason to skip it
[17:02] <baranowb> afaik its the only thing, atleast that Ive noticed
[17:03] <baranowb> well one thing is
[17:03] <martins> do you thin that this solution going over what jbc does
[17:04] <martins> can be significantly better than cluster wide
[17:04] <martins> for this first version ?
[17:04] <baranowb> our code has one advantage - only one node is owner of
data
[17:04] <martins> I mean, do you think it is worth
[17:04] <baranowb> in jbc impl, data is copied to _/ of each node
[17:04] <baranowb> and imho this is not what we want
[17:05] <baranowb> data owner is consistent with election policy
[17:05] <baranowb> see
[17:05] <baranowb> in jbc impl
[17:05] <baranowb> on buddy failure, each node moved _/BACKUP_/N/
[17:06] <baranowb> to its _/
[17:06] <baranowb> so if node fails, and there are two buddies, each has
copy of
failed buddy at its root after failure
[17:07] <martins> that sounds like another bug
[17:07] <baranowb> and iirc impl of cluster we have expects only elected
node to have
it, right?
[17:08] <martins> well, does jbc updates all nodes when one changes that
data ?
[17:08] <baranowb> hmm, dont think so, did not see this happen
[17:09] <martins> I mean, if oen takes ownership in mobicents side, such as
restoring
a timer, and when the tiemr fires it deletes its data, does jbc deletes the
data from
all nodes where it restored the data?
[17:09] <martins> that is a huge leak
[17:09] <martins> if it doesn't delete in all
[17:10] <baranowb> hehe, dont know, cause I assumed in impl that others
should not
get it, so only one node has data :)
[17:10] <baranowb> possibly gravitation removes it, but cant say for sure
[17:11] <martins> well, must do the same when a node gets data through
graviation
[17:11] <baranowb> since its direct call, it should gravitate remove
operation
[17:11] <martins> when it deletes must delete from all
[17:13] <martins> I'm not getting into this buddy group thing with that
much "grey" zones
[17:14] <martins> in that N1,N2,N3
[17:14] <martins> when N1 rejoins
[17:15] <martins> does JBC really copy all to N1 and deletes data from N2
and N3, or
is it our code doing the N2 and N3 deletes?
[17:15] <baranowb> its jbc, Ive tried it - I removed our calls to data to
see what is
going on
[17:15] <baranowb> besides there is jbc log
[17:16] <baranowb> about gravitation
[17:16] <martins> does it invoke any callback in N2 and N3when it does that
[17:16] <baranowb> #
[17:16] <baranowb> 16:04:08,234 TRACE [MVCCNodeHelper] Node /ac is not in
context,
fetching from container.
[17:16] <baranowb> #
[17:16] <baranowb> 16:04:08,234 TRACE [DataGravitatorInterceptor] Checking
local
existence of requested fqn /ac
[17:16] <baranowb> #
[17:16] <baranowb> 16:04:08,234 TRACE [DataGravitatorInterceptor]
Gravitating from
local backup tree
[17:17] <baranowb> #
[17:17] <baranowb> 16:04:08,234 TRACE [CallInterceptor] Executing command:
GravitateDataCommand{fqn=/ac, searchSubtrees=true}.
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [InvocationContextInterceptor]
Invoked with
command EvictCommand{fqn=/_BUDDY_BACKUP_/127.0.0.1_3273/ac, recursive=true}
and
InvocationContext [InvocationContext{transaction=TransactionImple < ac,
BasicAction:
-560196f5:ce0:4b225f4b:63 status: ActionStatus.RUNNING >,
globalTransaction=GlobalTransaction:<127.0.0.1:3306>:0,
transactionContext=TransactionEntry
[17:18] <baranowb> #
[17:18] <baranowb> modificationList: null,
optionOverrides=Option{failSilently=false,
cacheModeLocal=false, dataVersion=null, suppressLocking=false,
lockAcquisitionTimeout=-1, forceDataGravitation=false,
skipDataGravitation=false,
forceAsynchronous=false, forceSynchronous=false, suppressPersistence=false,
suppressEventNotification=false}, originLocal=true,
bypassUnmarshalling=false}]
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [InvocationContextInterceptor]
Setting up
transactional context.
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [InvocationContextInterceptor]
Setting tx as
TransactionImple < ac, BasicAction: -560196f5:ce0:4b225f4b:63 status:
ActionStatus.RUNNING > and gtx as GlobalTransaction:<127.0.0.1:3306>:0
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [TxInterceptor] local transaction
exists -
registering global tx if not present for Thread[main,5,jboss]
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [TxInterceptor] Associated gtx in
txTable is
GlobalTransaction:<127.0.0.1:3306>:0
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [TxInterceptor] Transaction
TransactionImple <
ac, BasicAction: -560196f5:ce0:4b225f4b:63 status: ActionStatus.RUNNING >
is already
registered.
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=.....
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCNodeHelper] Retrieving wrapped
node
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=.....
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=..../...
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=.../.../...
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=.../...
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=/../.../...
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCLockManager] Attempting to lock
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=...
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [MVCCNodeHelper] Retrieving wrapped
node
/_BUDDY_BACKUP_/127.0.0.1_3273/ac/ACH=SERVICE>>ServiceID[name=TimerExampleService,vendor=org.mobicents,version=1.0]
[17:18] <baranowb> #
[17:18] <baranowb> 16:04:08,265 TRACE [CallInterceptor] Executing command:
EvictCommand{fqn=/_BUDDY_BACKUP_/127.0.0.1_3273/ac, recursive=true}.
[17:18] <baranowb> bla bla bla and so on
[17:19] <baranowb> I dont have log for other nodes
[17:19] <martins> which node is that
[17:19] <baranowb> N_1
[17:19] <martins> what is 3273
[17:19] <martins> N3?
[17:19] <baranowb> port
[17:19] <baranowb> y, buddies are stored as
[17:19] <baranowb> IP_PO
[17:19] <baranowb> PORT
[17:19] <martins> I know it is port :p
[17:19] <baranowb> :)
[17:20] <baranowb> its N3 I think
[17:20] <martins> that is on rejoin ?
[17:20] <baranowb> y
[17:21] <martins> you are fetching data from N3
[17:21] <martins> and it has the data on back tree
[17:21] <martins> what is wrong there
[17:21] <baranowb> its local call, and gravitation should be supresed
[17:22] <baranowb> for this case
[17:22] <martins> why is it local
[17:22] <baranowb> cause we should not get /ac
[17:22] <baranowb> from all other nodes
[17:23] <martins> you need to get AC to get a child
[17:23] <baranowb> well thats not entirely true - see
[17:23] <baranowb> when all nodes are running
[17:23] <baranowb> one dies
[17:23] <baranowb> N2 takes over
[17:23] <baranowb> we can create on N3 ubnder /ac
[17:24] <baranowb> nothing happens, no data sucking, no nothing, just
creating
something under /ac
[17:24] <martins> that is because ac already exists?
[17:25] <baranowb> y
[17:25] <baranowb> atleast I suspect thats the cause
[17:25] <baranowb> there is /ac in local cache instance
[17:26] <martins> baranowb, I think there are much to uncover and work on
this jbc config
[17:27] <martins> I feel it is far from a stable usable config for our HA
[17:27] <martins> do you agree?
[17:27] <baranowb> with gravitation i agree,
[17:28] <baranowb> first case could be of use, it seems to work ok
[17:28] <baranowb> and result is deterministic
[17:29] <martins> I'm not sure
[17:29] <baranowb> but if there is no urge, we can leave it, I still have
few valid fixes
[17:29] <martins> it looks like a use case that may be more toruble making
than life
savior
[17:29] <martins> would be preferable to split a cluster in smaller clusters
[17:30] <baranowb> ok, I agree, we can freeze this issue with patch
[17:30] <baranowb> yes, exactly
[17:30] <martins> don't freeze, we need to continue working on this in
parallel
[17:30] <baranowb> freeze for this release I mean :)
[17:31] <martins> till we have our doubts all answered
[17:31] <martins> the buddy group + gravitation has the potentical to
become the
default setup
[17:31] <martins> potential
[17:31] <martins> and thus we should make it a big priority in HA
[17:32] <martins> well, we should point it for what kind of release, 2.1 ?
[17:33] <baranowb> y
--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings