[
https://issues.apache.org/jira/browse/BOOKKEEPER-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616397#comment-13616397
]
Jiannan Wang commented on BOOKKEEPER-590:
-----------------------------------------
I find there is a critical bug in current gc for metadata storage
implementation, which may cause most of ledgers been gc by mistake!
Each time after metadata storage scan part of ledger list, it return a new
LedgerRange by following constructor:
{code:java}
public LedgerRange(Set<Long> ledgers) {
this(ledgers, NOLIMIT, NOLIMIT);
}
{code}
which is assumed to be the full ledger list in metadata storage:
{code:java}
while(ledgerRangeIterator.hasNext()) {
LedgerRange lRange = ledgerRangeIterator.next();
Map<Long, Boolean> subBkActiveLedgers = null;
Long start = lRange.start();
Long end = lRange.end();
if (end != LedgerRange.NOLIMIT) {
subBkActiveLedgers = bkActiveLedgersSnapshot.subMap(start,
true, end, true);
} else {
if (start != LedgerRange.NOLIMIT) {
subBkActiveLedgers = bkActiveLedgersSnapshot.tailMap(start);
} else {
subBkActiveLedgers = bkActiveLedgersSnapshot; <- Code run to here!!
}
}
Set<Long> globalActiveLedgers = lRange.getLedgers();
LOG.debug("All active ledgers for hash node {}, Current active ledgers from
Bookie for hash node {}",
globalActiveLedgers, subBkActiveLedgers.keySet());
for (Long bkLid : subBkActiveLedgers.keySet()) {
if (!globalActiveLedgers.contains(bkLid)) {
// remove it from current active ledger
subBkActiveLedgers.remove(bkLid);
garbageCleaner.clean(bkLid);
}
}
}
{code}
You can easily check it by adding "MSLedgerManagerFactory.class" to
LedgerManagerTestCase#configs() and run test "mvn test -Dtest=GcLedgersTest".
So I recommend we consider this JIRA quickly or create another JIRA to fix the
problem.
> Another Scan-And-Compare GC Implementation
> ------------------------------------------
>
> Key: BOOKKEEPER-590
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-590
> Project: Bookkeeper
> Issue Type: Improvement
> Components: bookkeeper-server
> Reporter: Jiannan Wang
> Assignee: Jiannan Wang
> Attachments: BOOKKEEPER-590.patch
>
>
> The idea of Scan-And-Compare GC is as below:
> * Assume the ledger id list in local bookie server is *LocalLedgers*
> * At the same time, the ledger id list at metadata storage is *LiveLedgers*
> * Then the ledgers require garbage collection are *LocalLedgers -
> LiveLedgers*
> Under current implementation, an ledger id order guarantee is required when
> obtain *LiveLedgers* from metadata storage. However, this is unnecessary: we
> get *LocalLedgers* and we can just remove elements that in *LiveLedgers* one
> by one in any order.
> What's more, without the order requirement when scan all ledger ids, some
> things become simple:
> * We even don't need radix tree to maintain 64-bits ledger metadata, a
> hierarchical hash tree is enough (just as what topic metadata management
> does).
> * Easy to handle 64-bit ledger id backward compatibility for
> MSLedgerManager:
> ** Currently, for MSLedgerManager, we format ledger id to a fixed
> length (it's 10 now) digit string to make order scan
> ** When a 64-bit ledger id is used we need to enlarge the fixed length,
> then old ledger id backward compatibility turns to be a trouble if we require
> this order guarantee.
> As above reasons, it would better to remove specific order requirement from
> current Scan-And-Compare GC implementation.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira