[
https://issues.apache.org/jira/browse/KARAF-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15438650#comment-15438650
]
Sergiy Shyrkov commented on KARAF-4664:
---------------------------------------
Hello JB,
I hope we were able to move forward with the investigation of this issue on our
side.
In our case we have two cluster nodes and sometimes on a startup of the second
one we see bundles on the first one getting stopped.
Enabling the DEBUG logging for Cellar reveals that on the startup of the
org.apache.karaf.cellar.bundle the activator calls BundleSynchronizer.init()
which after doing a pull of the cluster state executes push() (we are using
"cluster" sync strategy).
{code}
BundleSynchronizer: CELLAR BUNDLE: updating cluster from the local node (push
after)
BundleSynchronizer: CELLAR BUNDLE: pushing bundles to cluster group default
BundleSynchronizer: CELLAR BUNDLE: updating bundle event/2.0.2 on the cluster
BundleSynchronizer: CELLAR BUNDLE: updating bundle facets/7.1.2 on the cluster
...
{code}
The thing is that at that point our bundles are not started yet completely, so
the BundleSynchronizer "sees" that their local state is not the same as in
cluster and pushes the state update to cluster (which stops them on other node).
I guess, it is related to the fact that our bundles (module) have start level
of 90, where as Cellar bundles have default one: 80.
Could you perhaps have any hints on how we could "postpone" the
BundleSynchronizer.push() till the end of startup?
E.g. waiting for the STARTLEVEL_CHANGED event in a framework listener. And
temporary preventing BundleSynchronizer from syncing the state (using another
sync policy, say "clusterOnly") or somehow else.
We could try perhaps redefining the cellar feature on our side to use a higher
start-level for cellar bundles, if it would solve the issue.
Thank you in advance!
> Cellar can stop bundles whereas it should not
> ---------------------------------------------
>
> Key: KARAF-4664
> URL: https://issues.apache.org/jira/browse/KARAF-4664
> Project: Karaf
> Issue Type: Bug
> Components: cellar-bundle, cellar-features
> Affects Versions: cellar-4.0.1
> Reporter: Jean-Baptiste Onofré
> Assignee: Jean-Baptiste Onofré
> Fix For: cellar-4.0.2
>
>
> If we have a cluster of 2 nodes running, both with all the same bundles
> deployed and activated. We stop the second node, then the first one. We start
> the first node, wait until it is fully initialized, then start the second
> one. The issue is that sometimes second node startup causes few bundles on
> the first node to stop.
> My guess is that the second node, when joining the cluster, pushes its state
> to the cluster while some of its bundles are still starting and in RESOLVED
> state, which causes corresponding bundles on the first node to stop and stay
> RESOLVED too. My hypothesis is supported by the fact that once I change the
> "default.*.sync" options in the "org.apache.karaf.cellar.groups.cfg" from
> "cluster" to "clusterOnly", the issue stops happening. However, I think we
> still should support the "cluster" synchronization strategy if possible.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)