Stephanie Cowie created AMQ-8140:
------------------------------------
Summary: Lock Keep Alive does not work on Windows Server 2019
Key: AMQ-8140
URL: https://issues.apache.org/jira/browse/AMQ-8140
Project: ActiveMQ
Issue Type: Bug
Components: Broker
Affects Versions: 5.15.8, 5.16.1
Environment: Windows Server 2019, Three node master\slave with shared
folder.
Kahadb Persistence
Directory is shared folder on a file server
lockKeepAlive set to 5 seconds
Reporter: Stephanie Cowie
Deployment is Master \Slave with shared folder. The Broker is configured with a
PersistenceAdapter of kahadb. The directory is a shared folder on a file
server. The lockKeepAlivePeriod is set to 5 seconds. The Broker is running on
three application nodes. On Windows Server 2012 and Windows Server 2012, the
master retains the lock. On Windows Server 2019, the Master cannot retain the
lock, and rescinds master. Slave node gets lock but within 1 to 40 seconds
detects that the lastModified date of the file is not what was stored when the
lock was obtained. This results in all three nodes repeatedly attempting to get
the lock, detecting a change, and giving up the lock.
Added logging to the org.apache.activemq.util.LockFile and determined that the
root cause is that the lastModified property of the File is not updated
immediately. Thus the stored date for the Master does not match the actual
modified time. For example, if the lastModified timestamp was February 2nd
13:00 2021 before the broker obtained the lock at February 3rd 08:00 2021, the
timestamp stored is February 2nd 13:00 2021. The keep alive is triggered, and
checks the File lastModified timestamp against what is stored. Eventually the
lastModified timestamp returns the actual time the File was locked of February
3rd 08:00 2021. As this does not match what was stored, the broker gives up
being master, and triggers complete failure across all three nodes.
The LockFile code writes the current time in milliseconds to the File via the
RandomAccessFile object. Adding code to read the data from the file showed that
the write was persisting immediately, unlike the metadata. Changing the code to
store the written time in the LockFile.lock() method, and reading from the file
in the LockFile.hasBeenModified() method to determine whether file has changed,
rather that relying on the lastModified property, resolves the issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)