Alan Protasio created AMQ-7080:
----------------------------------

             Summary: Keep track of free pages - Update db.free file during 
checkpoints
                 Key: AMQ-7080
                 URL: https://issues.apache.org/jira/browse/AMQ-7080
             Project: ActiveMQ
          Issue Type: Improvement
          Components: KahaDB
    Affects Versions: 5.15.6
            Reporter: Alan Protasio


In a event of an unclean shutdown, Activemq loses the information about the 
free pages in the index. In order to recover this information, ActiveMQ read 
the whole index during shutdown searching for free pages and then save the 
db.free file. This operation can take a long time, making the failover slower. 
(during the shutdown, activemq will still hold the lock).

>From http://activemq.apache.org/shared-file-system-master-slave.html
{quote}"If you have a SAN or shared file system it can be used to provide high 
availability such that if a broker is killed, another broker can take over 
immediately."
{quote}
Is important to note if the shutdown takes more than ACTIVEMQ_KILL_MAXSECONDS 
seconds, any following shutdown will be unclean. This broker will stay in this 
state unless the index is deleted (this state means that every failover will 
take more then ACTIVEMQ_KILL_MAXSECONDS, so, if you increase this time to 5 
minutes, you fail over can take more than 5 minutes).

 

In order to prevent ActiveMQ reading the whole index file to search for free 
pages, we can keep track of those on every Checkpoint. In order to do that we 
need to be sure that db.data and db.free are in sync. To achieve that we can 
have a attribute in the db.free page that is referenced by the db.data.

So during the checkpoint we have:

1 - Save db.free and give a freePageUniqueId

2 - Save this freePageUniqueId in the db.data (metadata)

In a crash, we can see if the db.data has the same freePageUniqueId as the 
db.free. If this is the case we can safely use the free page information 
contained in the db.free

Now, the only way to read the whole index file again is IF the crash happens 
btw step 1 and 2 (what is very unlikely).

The drawback of this implementation is that we will have to save db.free during 
the checkpoint, what can possibly increase the checkpoint time.

Is also important to note that we CAN (and should) have stale data in db.free 
as it is referencing stale db.data:

Imagine the timeline:

T0 -> P1, P2 and P3 are free.

T1 -> Checkpoint

T2 -> P1 got occupied.

T3 -> Crash

In the current scenario after the  Pagefile#load the P1 will be free and then 
the replay will mark P1 as occupied or will occupied another page (now that the 
recovery of free pages is done on shutdown)

This change only make sure that db.data and db.free are in sync and showing the 
reality in T1 (checkpoint), If they are in sync we can trust the db.free.

This is a really fast draft of what i'm suggesting... If you guys agree, i can 
create the proper patch after:

[https://github.com/alanprot/activemq/commit/18036ef7214ef0eaa25c8650f40644dd8b4632a5]
 

This is related to https://issues.apache.org/jira/browse/AMQ-6590



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to