[
https://issues.apache.org/jira/browse/AMQ-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655811#comment-16655811
]
Jeff Genender commented on AMQ-7080:
------------------------------------
This looks good to me. My comments are the fingerprint shouldn't be a random
due to potential clashes. I would use a time in millis or use a time based
UUID. Anything that has an excellent chance of being unique.
[~cshannon], you made some good comments about ACTIVEMQ_KILL_MAXSECONDS which
can be an issue. Luckily that setting is an easy changeable parameter.
However, what's your's and [~gtully] thoughts on removing that from the
'activemq stop', letting it stop normally, and perhaps create a new invoker,
call it 'activemq force-stop', which could use the ACTIVEMQ_KILL_MAXSECONDS
parameter? It seems to me that its been a long while since I have actually
seen ActiveMQ "hang" on its own and slow shut downs have been the consequence
of it doing its thing. Any thoughts/opinions on this? I would be happy to do
it if you guys find value in this.
> Keep track of free pages - Update db.free file during checkpoints
> -----------------------------------------------------------------
>
> Key: AMQ-7080
> URL: https://issues.apache.org/jira/browse/AMQ-7080
> Project: ActiveMQ
> Issue Type: Improvement
> Components: KahaDB
> Affects Versions: 5.15.6
> Reporter: Alan Protasio
> Priority: Major
>
> In a event of an unclean shutdown, Activemq loses the information about the
> free pages in the index. In order to recover this information, ActiveMQ read
> the whole index during shutdown searching for free pages and then save the
> db.free file. This operation can take a long time, making the failover
> slower. (during the shutdown, activemq will still hold the lock).
> From http://activemq.apache.org/shared-file-system-master-slave.html
> {quote}"If you have a SAN or shared file system it can be used to provide
> high availability such that if a broker is killed, another broker can take
> over immediately."
> {quote}
> Is important to note if the shutdown takes more than ACTIVEMQ_KILL_MAXSECONDS
> seconds, any following shutdown will be unclean. This broker will stay in
> this state unless the index is deleted (this state means that every failover
> will take more then ACTIVEMQ_KILL_MAXSECONDS, so, if you increase this time
> to 5 minutes, you fail over can take more than 5 minutes).
>
> In order to prevent ActiveMQ reading the whole index file to search for free
> pages, we can keep track of those on every Checkpoint. In order to do that we
> need to be sure that db.data and db.free are in sync. To achieve that we can
> have a attribute in the db.free page that is referenced by the db.data.
> So during the checkpoint we have:
> 1 - Save db.free and give a freePageUniqueId
> 2 - Save this freePageUniqueId in the db.data (metadata)
> In a crash, we can see if the db.data has the same freePageUniqueId as the
> db.free. If this is the case we can safely use the free page information
> contained in the db.free
> Now, the only way to read the whole index file again is IF the crash happens
> btw step 1 and 2 (what is very unlikely).
> The drawback of this implementation is that we will have to save db.free
> during the checkpoint, what can possibly increase the checkpoint time.
> Is also important to note that we CAN (and should) have stale data in db.free
> as it is referencing stale db.data:
> Imagine the timeline:
> T0 -> P1, P2 and P3 are free.
> T1 -> Checkpoint
> T2 -> P1 got occupied.
> T3 -> Crash
> In the current scenario after the Pagefile#load the P1 will be free and then
> the replay will mark P1 as occupied or will occupied another page (now that
> the recovery of free pages is done on shutdown)
> This change only make sure that db.data and db.free are in sync and showing
> the reality in T1 (checkpoint), If they are in sync we can trust the db.free.
> This is a really fast draft of what i'm suggesting... If you guys agree, i
> can create the proper patch after:
> [https://github.com/alanprot/activemq/commit/18036ef7214ef0eaa25c8650f40644dd8b4632a5]
>
> This is related to https://issues.apache.org/jira/browse/AMQ-6590
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)