[
https://issues.apache.org/jira/browse/BOOKKEEPER-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13260852#comment-13260852
]
Roger Bush commented on BOOKKEEPER-181:
---------------------------------------
@ivan
>> This is adding another abstraction into the metastore interface which only
>> makes sense for the bookkeeper delete ledger scenario. For hedwig this makes
>> no sense.
I don't think I was being clear enough. Actually this is not what I was
saying. My point was that ledger garbage collection can be implemented without
scan, simplifying the kv API. Nor would I want to put the dequeue into the API
(as you point out this makes no sense, and I agree). The dequeue is an
application-specific implementation technique for implementing ledger deletion
which doesn't rely on SCAN but only on get/set/delete/CAS.
>> Moreover, Hedwig does require scan.
Sure, but this could be handled by adding a scan interface. You'd have BK
relying on the abstract interface that doesn't include scan, and Hedwig using
the scan. BK would be ignorant of the fact that the implementation provides
scan. Hedwig, since it needs it, would use it. If an underlying store had a
natural SCAN, then the implementer would also implement the scan api (a single
function), if it didn't, the implementer would not. Thus, you'd have the best
of all worlds: BK can use a larger set of stores, Hedwig uses a smaller set of
stores (with SCAN), and there is no additional work required (no duplication of
code, or things that are almost the same - we just have a single, mixin scan
interface).
In other words the Metastore API can be thought of as representing the needs of
the two separate applications BK and Hedwig. Since (if?) BK doesn't need scan,
why require it? The scan would exclude many kv stores.
There is a simple and elegant way to not have code duplication, and have BK
have the API it needs (no scan) and Hedwig have the API it needs (+ scan). You
simply need a mixin interface that has only the scan related API. Therefore
hbase would implement scan, Hedwig would use scan and BK would not. A
different store that had no ability to do scan could still be used for BK.
Also, there is no cost to maintain or implement.
>> Not having scan also has other implications for BK. For 4.2.0 we want to
>> implement a "fsck" functionallity, which checks that each bookie contains
>> every ledger entry it should. This requires that we be able to get a list of
>> ledgers.
It is not clear that our group will be able to use hbase. This is why I'm
pushing to not require scan as part of the base BK requirements as it gives the
largest choice of kv stores possible. fsck could still be implemented using
scan, but perhaps as an external tool which needs scan (again this can use the
mixin approach).
> Scale hedwig
> ------------
>
> Key: BOOKKEEPER-181
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-181
> Project: Bookkeeper
> Issue Type: Improvement
> Components: bookkeeper-server, hedwig-server
> Reporter: Sijie Guo
> Assignee: Sijie Guo
> Fix For: 4.2.0
>
> Attachments: hedwigscale.pdf, hedwigscale.pdf
>
>
> Current implementation of Hedwig and BookKeeper is designed to scale to
> hundreds of thousands of topics, but now we are looking at scaling them to
> tens to hundreds of millions of topics, using a scalable key/value store such
> as HBase.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira