[jira] [Commented] (BOOKKEEPER-181) Scale hedwig

Ivan Kelly (JIRA) Mon, 23 Apr 2012 02:50:05 -0700

    [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259485#comment-13259485
 ]


Ivan Kelly commented on BOOKKEEPER-181:
---------------------------------------

One concern I have with the direction this, is that we seem to be looking 
ahead, trying to preempt all possible future requirements and then trying to 
accommodate everything in one go. The problem with this is that is creates a 
lot of work upfront, which may eventually never be necessary. I think it's 
better to do something which is simple and which meets our requirements now, 
and if more requirements come in later, simply change the interface. This 
interface isn't going to be set in stone once it's in. Also, creating a 
smaller/simpler interface now, means smaller patches, which will make it much 
easier for us to review things and get them into trunk.

I have a couple of comments about the prototype also. I think Versioned should 
be called VersionedValue or similar. Occurred should be an inner class of 
Version. Version should even be an inner class of VersionedValue.

Regarding the sync/async thing, we have two options here. BookKeeper and hedwig 
clients both assume asyncness in the data store, so for backends with sync 
apis, such as HBase, there will have to be some sort of adapter in place. We 
have to decide whether we put this adapter in (1) the metastore layer or in (2) 
the LedgerManager layer (TopicManager in Hedwig). For (1) the metastore API 
itself would be completely async. For (2) the MetaStoreLedgerManager would 
implement an async->sync adapter, and then use the metastore API which would be 
completely synchronous. The metastore api should be completely async or 
completely sync to keep the size of implementation down. Personally I prefer 
option (1). It means backends which already have async APIs work very simply. 

For scan, there are a couple of usecases where we would need to scan everything 
in a "table". This may have many many thousands of entries, so the api should 
be cursor based. We have had problems in the past with ZooKeeper's 
getChildren() api precisely because of this. Since it wasn't cursor based, it 
would try to pull down the whole list at once, which exceeded the max packet 
size for ZooKeeper.
                
> Scale hedwig
> ------------
>
>                 Key: BOOKKEEPER-181
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-181
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server, hedwig-server
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>             Fix For: 4.2.0
>
>         Attachments: hedwigscale.pdf, hedwigscale.pdf
>
>
> Current implementation of Hedwig and BookKeeper is designed to scale to 
> hundreds of thousands of topics, but now we are looking at scaling them to 
> tens to hundreds of millions of topics, using a scalable key/value store such 
> as HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-181) Scale hedwig

Reply via email to