[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506332#comment-13506332
 ] 

Flavio Junqueira commented on BOOKKEEPER-249:
---------------------------------------------

I think your interpretation of what I've written is correct, but I haven't 
expressed properly what I was trying to achieve, so let me step back.

When we delete a ledger L, we notify to a set B of bookies through zookeeper 
that they need to garbage collect L. The set B is the union of all ensembles of 
L as written in its metadata. There are possibly two reasons for getting 
spurious or zombie entries:

# A bookie b is not added to set B originally (bookie missing);
# A bookie b writes entries of L after garbage-collecting it (bookie race).

According to your examples, I think the former can happen if a bookie writes 
entries of L but ends up no forming part of the ensemble of L. I don't see a 
way of detecting it other than:

* Having a confirmation from the client that a bookie is actually part of the 
ledger ensemble, which in some sense "commits" the ledger fragment the bookie 
wrote. We don't have such a confirmation today, so it would be necessary to add 
this mechanism.
* Having bookies periodically check if the ledger metadata still exists.

The mechanism I was proposing was for the bookie race case, to avoid the extra 
polling mechanism you suggested. I was essentially trying to maintain a 
greatest lower bound for the ledgers that have already been deleted. I 
understand that we don't delete them in order, although my example did give 
that impression. 

To maintain such a greatest lower bound, I was thinking that we could delete 
only entire prefixes, with no holes. Let me go back to the example. If a bookie 
has entries for ledgers L1, L5, and L6, and L5 is deleted, then we wouldn't 
remove L5 or move the greatest lower bound until L1 is deleted. Once L1 is 
deleted, then we have a prefix formed by L1 and L5, and we remove the 
corresponding ledger fragments, setting also the greatest lower bound to 5.

One main drawback of this approach is having to wait until L1 is actually 
deleted, which can happen in principle at any arbitrary time. If ledgers don't 
live long, then it works fine. Otherwise, it could prevent bookies from 
reclaiming space for arbitrarily long periods.
                
> Revisit garbage collection algorithm in Bookie server
> -----------------------------------------------------
>
>                 Key: BOOKKEEPER-249
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-249
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>            Reporter: Sijie Guo
>             Fix For: 4.2.0
>
>         Attachments: gc_revisit.pdf
>
>
> Per discussion in BOOKKEEPER-181, it would be better to revisit garbage 
> collection algorithm in bookie server. so create a subtask to focus on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to