[
https://issues.apache.org/jira/browse/KARAF-5798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533032#comment-16533032
]
Matthew Zipay commented on KARAF-5798:
--------------------------------------
>> So the pid written by {{InstanceHelper.updateInstancePid}} should reflect
>>the current running master
This is problematic, I think. It would suffer from the same drawback that the
pid and port files do - the slave instance would always have karaf.pid, port
_and_ instance.properties files that contain incorrect values!
That makes administration confusing at best, because now an administrator needs
to inspect with {{ps}} and {{lsof}} to discover the correct values (and, of
course, neither the status nor stop commands work until those values are
corrected).
To put this into context: I have a master/slave node pair running right now.
The slave's karaf.pid file contains 19931. That is not correct. There is no
process 19931 running on that host. The slave's port file contains 33415. That
is also incorrect. There is nothing listening on port 33415 on that host.
(These are values from a previous instance that is no longer running.) As a
result, neither org.apache.karaf.main.Status nor org.apache.karaf.main.Stop
work.
If I stop the master instance, then the slave acquires the lock, becomes
master, and the files get written with correct values. But I don't think this
makes sense. The slave *is* a running process. Why should its karaf.pid not
reflect the correct value? Likewise, the slave *has* a shutdown port - why
should its port file not have the correct value?
IMO, all three of these files should be written before lock acquisition is even
attempted, because none of these values have anything to do with master/slave
status - they are OS administration values that are needed to manage a JVM
process.
> Karaf slave instance does not write pid or port file until it becomes master
> ----------------------------------------------------------------------------
>
> Key: KARAF-5798
> URL: https://issues.apache.org/jira/browse/KARAF-5798
> Project: Karaf
> Issue Type: Bug
> Components: karaf-boot
> Affects Versions: 4.0.9
> Reporter: Matthew Zipay
> Assignee: Jean-Baptiste Onofré
> Priority: Major
>
> In a Karaf master/slave environment, the slave process does not write its pid
> or port file until it acquires the lock and becomes the master.
> I am running Karaf 4.0.9 (ServiceMix 7.0.1).
> Karaf is configured as master/slave using the following from
> system.properties. Master and slave are on different physical nodes.
> {code:java}
> karaf.lock=true
> karaf.lock.class=org.apache.karaf.main.lock.OracleJDBCLock
> karaf.lock.level=79
> karaf.lock.delay=10000
> karaf.lock.jdbc.url=jdbc:oracle:thin:#REMOVED#
> karaf.lock.jdbc.driver=oracle.jdbc.driver.OracleDriver
> karaf.lock.jdbc.user=#REMOVED#
> karaf.lock.jdbc.password=#REMOVED#
> karaf.lock.jdbc.table=KARAF_LOCK
> karaf.lock.jdbc.clustername=karaf
> karaf.lock.jdbc.timeout=30
> karaf.lock.slave.block=false
> {code}
> Attempting to stop the slave Karaf process results in _"Can't connect to the
> container. The container is not running."_ This is not true, as a simple {{ps
> -ef | grep karaf}} confirms that it is in fact running. I am able to enter
> the Karaf shell just fine, use the web console, etc.
> I have confirmed through multiple tests that the pid and port files don't get
> written until the master lock is acquired.
> Steps:
> # With the Karaf slave node not started, note the pid and port files do not
> exist (or contain outdated values from a previous process).
> # Start the Karaf slave process.
> # Note that the pid and port files have not been written.
> # Stop the master process.
> # Observe the slave process acquire the lock and become master.
> # Note that the pid and port files have now been written.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)