[jira] [Commented] (BOOKKEEPER-181) Scale hedwig

Roger Bush (JIRA) Tue, 24 Apr 2012 19:28:31 -0700

    [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261247#comment-13261247
 ]


Roger Bush commented on BOOKKEEPER-181:
---------------------------------------

@flavio 

We've done a first pass walkthrough of the BK code.  In looking at the current 
BK code, and how ledgers are garbage collected, it looks like it does the 
following:

1. The garbage collector gets a list of active nodes.  The active nodes are 
determined by ephemeral nodes (one per reader on each ledger?).
2. The garbage collector then gets all the nodes.  If a node is not in the 
active list, it's corresponding ledger is deleted.

Note that the dequeue idea is simply a way of having a deletion list without a 
scan.  It does not solve the reference counting problem (something has to 
determine _when_ something can be deleted and put it on the list).  The only 
thing it does is decouples the determination of whether something can be 
deleted from the timing.  A simpler model could, for example, immediately 
delete when the reference count goes to zero.  So for now, let's table the 
dequeue idea, as there are more pressing issues to solve (if it's necessitated 
we can trot the idea out later).

How would we go about replacing the current garbage collection scheme with 
something that uses the Metastore interface?  As an aside, I don't think the 
above scheme scales since you'll have to continuously scan through millions of 
ledgers to find the few that can be reaped.  There will be many calls to scan 
(returning X items at a time), which fail to bring back any ledgers to delete.  
Here's one measurement from "real-life":  Tribble adds about 10 volumes 
(ledgers) per second.  However there are something like 200,000 ledgers 
(volumes).  So in steady state we are deleting 10 volumes per second out of 
200,000 (we keep 1000 volumes of backlog per each of 200 logs).  So for this 
use-case, we need to scan 200,000 records to find 10.  I'd imagine SNP would be 
even worse than this (1M to find X ledgers).

It would be more thrifty to keep a list of ledgers we can delete if we can 
discover those.  It seems that ref counting might be a way to accomplish this.

Could we talk a little bit about flavio's idea of "revisiting garbage 
collection"?

                
> Scale hedwig
> ------------
>
>                 Key: BOOKKEEPER-181
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-181
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server, hedwig-server
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.2.0
>
>         Attachments: hedwigscale.pdf, hedwigscale.pdf
>
>
> Current implementation of Hedwig and BookKeeper is designed to scale to 
> hundreds of thousands of topics, but now we are looking at scaling them to 
> tens to hundreds of millions of topics, using a scalable key/value store such 
> as HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-181) Scale hedwig

Reply via email to