[jira] [Commented] (BOOKKEEPER-181) Scale hedwig

Roger Bush (JIRA) Tue, 24 Apr 2012 12:35:59 -0700

    [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260876#comment-13260876
 ]


Roger Bush commented on BOOKKEEPER-181:
---------------------------------------

@flavio

>> I really like the idea of revisiting the way we garbage-collect ledgers. I'm 
>> not sure I fully grasp the dequeue idea, but I can see that there might be 
>> some simple, more efficient ways to do it without having to scan all ledger 
>> metadata; for example, by keeping a table of deleted ledgers.

@sijie

>> I had considered the idea to keeping a table of deleted ledgers, when I 
>> implemented metastore-based ledger manager before. It seems easy, but 
>> actually it doesn't. The critical problem is how to remove the items from 
>> deleted ledgers table. if we don't remove them, the table will grow and 
>> encounter the same issue as SCAN.

Dequeue
-------
A dequeue is a queue that you can add at one end and remove from the other end. 
 This is an abstract concept that we would implement using a table (the place 
where the entries in the queue go, which are ledgers to be deleted), and two 
"pointers" (one to the head where you add, and the tail from which you remove). 
 For this to "work" it needs to be implementable using set/get/delete/CAS (and 
no scan), as well as showing how you do not get "dangling ledgers" since we 
have at least two separate things to update (head/tail pointer and list), and 
we can fail after the first is done.

Basic representation works like this - we have a table for each bookie.  This 
table represents the list of ledgers which may be deleted.  We do this 
per-bookie to make things easy on ourselves (easy to balance the overall load 
on a per-bookie basis).  The entries of each table are (k,v) where the key is 
an incrementing number starting from 1.  The value is the ledger id (and 
anything else we want to put).  Adding an entry at the head, requires reading 
the head pointer, updating it to make space and adding the entry at that key.  
Failure is handled handling the writes to head and tail in the right order.  A 
failure will cause a "blank entry" (conceptually) within the boundaries of our 
head/tail pointers (which are just long integers).  Adding to the head is done 
by reading the head pointer, incrementing, writing the new head pointer, and 
inserting the new ledger item at the previous pointer position.  A failure will 
result in a "blank space".  You can have multiple blank spaces (multiple 
failures).  Deleting an item requires the reverse order of operations (delete 
the item first, then move the pointer).  We are doing things in such a way that 
the list can contain blanks, but no ledger will fall outside the head/tail 
boundaries.  The thing that deletes will have to know that a null item is OK 
(and silently skip it).

This will keep the list trimmed and you won't run out of space.  Further, it is 
a minimal list and you could do some nice things like make sure the list is 
trimmed at a certain rate, per bookie.  I won't delve into those details now.

That's how I'd do this without having scan for BK.

What is not clear from this description is how to handle "wraparound".  But I 
think this can be easily done (this is just the classic "dequeue abstraction"). 
 I need to think a little and present this clearly.

Also there may be some details that I'm not thinking about (yi is going to 
present some details for the SNV team on BK internals today, so maybe this will 
be clearer).

Hopefully this is clear enough for someone to say why, from a BK standpoint, 
this wouldn't work or would work (if that is obvious at this point - it may not 
be).



                
> Scale hedwig
> ------------
>
>                 Key: BOOKKEEPER-181
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-181
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server, hedwig-server
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.2.0
>
>         Attachments: hedwigscale.pdf, hedwigscale.pdf
>
>
> Current implementation of Hedwig and BookKeeper is designed to scale to 
> hundreds of thousands of topics, but now we are looking at scaling them to 
> tens to hundreds of millions of topics, using a scalable key/value store such 
> as HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-181) Scale hedwig

Reply via email to