[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259768#comment-13259768
 ] 

Roger Bush commented on BOOKKEEPER-181:
---------------------------------------

Ivan, thanks for your comments.

You bring up a very good point on the incremental approach, that is my thought 
as well.  We can always accommodate additional requirements using, for example, 
additional mixin interfaces.  Better to do something simple, with a very well 
understood purpose to solve today's problems, and do it the right way.  Let's 
solve the problems we know we have to solve now, in a simple, understandable 
way.  That was my goal with the prototype, so hopefully that shows (I tried to 
make it simple, minimal and complete).

VersionedValue sounds good to me.  Concerning Occurred, there is also the -1, 
0, +1 metaphor that has been in existence a long time which we could use.  As 
far as making things inner classes, I'm all for hiding details where possible.  
Keep in mind Version needs to be implemented by the Metadata Store 
implementation.  I'll look at your suggestions in context with the code today 
and get back to you on that in a separate comment.

Concerning async/sync, it seems like the Metadata Store API can be made simply 
async since each of its calls are 1-to-1 with the async store calls.  
Converting the sync to async (and thus eliminating sync) would be done by using 
the same callback style as BookKeeper, to make things cohesive.  I'm in 
agreement with you on this one.  However, I would also include in the Tests, a 
way of making the calls synchronous, for some simple tests (they are easier to 
follow).  This doesn't change the need for async tests, but is rather an easy 
way to write simple, linear tests that are understandable.

Once you get to something that is not 1-to-1 (api calls to internal async 
calls), you get some very distasteful looking "continuation-passing style" 
code, that lacks some sort of framework to make it look nice.  Does the adapter 
you mention use threads or some other technique?  It would be good to explore 
this a little.

Note we ran into this issue (async/sync) with the ManagedLedger (not to be 
confused with the LedgerManager you speak of above).  ManagedLedger needs to 
make several async calls and do different things based on the result (in 
addition to doing error recovery, of course).  When we tried to make this pure 
async, we created some very interesting code (interesting in a way we want to 
avoid).  I'm of the opinion that ManagedLedger needs to be threaded, or have 
some async framework that makes the code more readable (e.g. continuation 
passing style framework or otherwise).

As far as where we put this async->sync adapter, it seems like it needs to be 
available to the Metadata Store implementer, so this could be a utility library 
we supply with some notes (it would be nice if we could use this for the 
ManagedLedger too).

Concerning scan, there definitely needs to be cursors to break this up into 
chunks, that is a certainty for scan.  However, in the case of BookKeeper, it 
seems like the use case is simply for deleting ledgers that have been marked 
delete by Hedwig.  In that case, couldn't we just use a simple "dequeue 
abstraction"?  This would avoid the need for scan.  Here's a sketch of how I 
envision this working:  Each bookie has it's own "table" for this.  This would 
let the trimmer do some nice scheduling of deletes that maps to the physical 
layer (we'd want to remove ledgers at just a little bit faster than they are 
marked delete - and perhaps there is a knob to make this go faster, in 
emergency situations when we need to recover space).  The table for the bookie 
would have an expanding list (the dequeue) where entries are added at the tail, 
and removed from the head.  There would be another table that would hold a 
reference to both the head and tail.  These would be updated in a fashion such 
that any system crash simply results in a blank entry just after/below the 
head/tail.  The entries would have a key "1", "2", ... and the value would be 
the ledger id.  The trimmer would remove from the earliest entries (and update 
it's cursor), and you would add new mark deletes to the tail.  You'd need to 
use CAS on the cursor update, and do things in such a way that failure doesn't 
strand an entry outside of the bounds of the head/tail cursor.  There is 
wraparound to think of (tail cursor going too high). This must be a technique 
that already exists, no?

So if scan can be elided from the BookKeeper requirements by this simple (and 
hopefully workable) technique, what we could do is have an additional scan java 
interface for Hedwig, which may need it.  The underlying implementation would 
have to also implement this interface to support Hedwig.  This can be checked 
at load time.  We can make some code to show how everything cooperates nicely 
(to show:  some usage code that shows nice, elegant extended requirements, that 
doesn't need any code duplication).
                
> Scale hedwig
> ------------
>
>                 Key: BOOKKEEPER-181
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-181
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server, hedwig-server
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.2.0
>
>         Attachments: hedwigscale.pdf, hedwigscale.pdf
>
>
> Current implementation of Hedwig and BookKeeper is designed to scale to 
> hundreds of thousands of topics, but now we are looking at scaling them to 
> tens to hundreds of millions of topics, using a scalable key/value store such 
> as HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to