Venkateswararao Jujjuri (JV) created BOOKKEEPER-944:
-------------------------------------------------------

             Summary: Multiple issues and improvements to BK Compaction.
                 Key: BOOKKEEPER-944
                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-944
             Project: Bookkeeper
          Issue Type: Improvement
          Components: bookkeeper-server
    Affects Versions: 4.4.0
            Reporter: Venkateswararao Jujjuri (JV)


We have identified multiple issues with BK compaction.
This issue is to list all of them in one Jira ticket.

1.
MajorCompaction and MinorCompaction are very basic. Either they do it or won’t 
do it. Proposal is to  add Low Water Mark(LWM) and High Water Mark(HWM) to the 
disk space. Have different compaction frequency and re-claim %s when the disk 
space is < low water mark  ,  >  LWM < HWM, > HWM.

2.
MajorCompaction and Minor Compactions are strictly frequency based. They should 
at least time of the day based, and also run during low system load, and if the 
system load raises, reduce the compaction depending on the disk availability 

3.
Current code disables compaction when disk space grows beyond configured 
threshold. There is no exit from this point. Have an option to keep reserved 
space for compaction, at least 2 entryLog file sizes when 
isForceGCAllowWhenNoSpace enabled.

4.
Current code toggles READONLY status of the bookie as soon as it falls below 
the disk storage threshold. Imagine if we keep 95% as the threshold, Bookie 
becomes RW as soon as it falls below 95 % and few more writes pushes it above 
95 and it turns back to RONLY. Use a set of defines (another set of LWM/HWM?) 
where Bookie turns RO on high end and won't become RW until it hits low end.

5.
Current code never checks if the compaction is enabled or disabled once the 
major/minor compaction is started. If the bookie goes > disk threshold (95%) 
and at that compaction is going on, it never checks until it finishes but there 
may not be disk available for compaction to take place. So check if compaction 
is enabled after processing every EntryLog.

6.
Current code changes the Bookie Cookie value even when new storage is added. 
When the cookie changes Bookie becomes a new one, and BK cluster treats it as 
new bookie. If we have mechanism to keep valid cookie even after adding 
additional disk space, we may have a chance to bring the bookie back to healthy 
mode and have compaction going.

7. Bug
CheckPoint was never attempted to complete after once sync failure. There is a 
TODO in the code for this area.

8.
When the disk is above threshold, Bookie goes to RO. If we have to restart the 
bookie, on the way back, bookie tries to create new entrylog and other files, 
which will fail because disk usage is above threshold, hence bookie refuses to 
come up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to