[
https://issues.apache.org/jira/browse/BOOKKEEPER-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259485#comment-13259485
]
Ivan Kelly commented on BOOKKEEPER-181:
---------------------------------------
One concern I have with the direction this, is that we seem to be looking
ahead, trying to preempt all possible future requirements and then trying to
accommodate everything in one go. The problem with this is that is creates a
lot of work upfront, which may eventually never be necessary. I think it's
better to do something which is simple and which meets our requirements now,
and if more requirements come in later, simply change the interface. This
interface isn't going to be set in stone once it's in. Also, creating a
smaller/simpler interface now, means smaller patches, which will make it much
easier for us to review things and get them into trunk.
I have a couple of comments about the prototype also. I think Versioned should
be called VersionedValue or similar. Occurred should be an inner class of
Version. Version should even be an inner class of VersionedValue.
Regarding the sync/async thing, we have two options here. BookKeeper and hedwig
clients both assume asyncness in the data store, so for backends with sync
apis, such as HBase, there will have to be some sort of adapter in place. We
have to decide whether we put this adapter in (1) the metastore layer or in (2)
the LedgerManager layer (TopicManager in Hedwig). For (1) the metastore API
itself would be completely async. For (2) the MetaStoreLedgerManager would
implement an async->sync adapter, and then use the metastore API which would be
completely synchronous. The metastore api should be completely async or
completely sync to keep the size of implementation down. Personally I prefer
option (1). It means backends which already have async APIs work very simply.
For scan, there are a couple of usecases where we would need to scan everything
in a "table". This may have many many thousands of entries, so the api should
be cursor based. We have had problems in the past with ZooKeeper's
getChildren() api precisely because of this. Since it wasn't cursor based, it
would try to pull down the whole list at once, which exceeded the max packet
size for ZooKeeper.
> Scale hedwig
> ------------
>
> Key: BOOKKEEPER-181
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-181
> Project: Bookkeeper
> Issue Type: Improvement
> Components: bookkeeper-server, hedwig-server
> Reporter: Sijie Guo
> Assignee: Sijie Guo
> Fix For: 4.2.0
>
> Attachments: hedwigscale.pdf, hedwigscale.pdf
>
>
> Current implementation of Hedwig and BookKeeper is designed to scale to
> hundreds of thousands of topics, but now we are looking at scaling them to
> tens to hundreds of millions of topics, using a scalable key/value store such
> as HBase.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira