[jira] [Commented] (BOOKKEEPER-181) Scale hedwig

Roger Bush (JIRA) Tue, 24 Apr 2012 12:19:57 -0700

    [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260852#comment-13260852
 ]


Roger Bush commented on BOOKKEEPER-181:
---------------------------------------

@ivan

>> This is adding another abstraction into the metastore interface which only 
>> makes sense for the bookkeeper delete ledger scenario. For hedwig this makes 
>> no sense.

I don't think I was being clear enough.  Actually this is not what I was 
saying.  My point was that ledger garbage collection can be implemented without 
scan, simplifying the kv API.  Nor would I want to put the dequeue into the API 
(as you point out this makes no sense, and I agree).  The dequeue is an 
application-specific implementation technique for implementing ledger deletion 
which doesn't rely on SCAN but only on get/set/delete/CAS.

>> Moreover, Hedwig does require scan.

Sure, but this could be handled by adding a scan interface.  You'd have BK 
relying on the abstract interface that doesn't include scan, and Hedwig using 
the scan.  BK would be ignorant of the fact that the implementation provides 
scan.  Hedwig, since it needs it, would use it.  If an underlying store had a 
natural SCAN, then the implementer would also implement the scan api (a single 
function), if it didn't, the implementer would not.  Thus, you'd have the best 
of all worlds:  BK can use a larger set of stores, Hedwig uses a smaller set of 
stores (with SCAN), and there is no additional work required (no duplication of 
code, or things that are almost the same - we just have a single, mixin scan 
interface).

In other words the Metastore API can be thought of as representing the needs of 
the two separate applications BK and Hedwig.  Since (if?) BK doesn't need scan, 
why require it?  The scan would exclude many kv stores.

There is a simple and elegant way to not have code duplication, and have BK 
have the API it needs (no scan) and Hedwig have the API it needs (+ scan).  You 
simply need a mixin interface that has only the scan related API.  Therefore 
hbase would implement scan, Hedwig would use scan and BK would not.  A 
different store that had no ability to do scan could still be used for BK.  
Also, there is no cost to maintain or implement.

>> Not having scan also has other implications for BK. For 4.2.0 we want to 
>> implement a "fsck" functionallity, which checks that each bookie contains 
>> every ledger entry it should. This requires that we be able to get a list of 
>> ledgers.

It is not clear that our group will be able to use hbase.  This is why I'm 
pushing to not require scan as part of the base BK requirements as it gives the 
largest choice of kv stores possible.  fsck could still be implemented using 
scan, but perhaps as an external tool which needs scan (again this can use the 
mixin approach).

                
> Scale hedwig
> ------------
>
>                 Key: BOOKKEEPER-181
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-181
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server, hedwig-server
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.2.0
>
>         Attachments: hedwigscale.pdf, hedwigscale.pdf
>
>
> Current implementation of Hedwig and BookKeeper is designed to scale to 
> hundreds of thousands of topics, but now we are looking at scaling them to 
> tens to hundreds of millions of topics, using a scalable key/value store such 
> as HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-181) Scale hedwig

Reply via email to