[jira] [Commented] (BOOKKEEPER-181) Scale hedwig

Sijie Guo (Commented) (JIRA) Tue, 27 Mar 2012 09:54:50 -0700

    [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239625#comment-13239625
 ]


Sijie Guo commented on BOOKKEEPER-181:
--------------------------------------

Thanks for Flavio's comments.

> For bookkeeper, we need to access ledger metadata both from clients and 
> bookies, right?

yes. the implementation of metastore based ledger manager should be two tasks, 
one is client, the other one is server. the server part would depends on the 
client part, because client part handles how to store ledger metadata, server 
part handles how to garbage collect ledgers.

> Is this correct? If so, the plugable interface will allow the use of 
> different repositories for the metadata part, but we will still rely upon 
> zookeeper to monitor node availability.

yes. we still use zookeeper for node availability, while moving metadata 
operations to different storage.

> In the definition of the compare-and-swap operation, the comparison is 
> performed using the key and value itself. This might be expensive, so I was 
> wondering if it is a better approach to use versions instead. The drawback is 
> relying upon a backend that provides versioned data. It seems fine for me, 
> though.

in the proposal, the comparison operation is just applied in a cell (located by 
(key,family,qualifier), while the set operation can be applied on multiple 
cells.
for example, suppose we have two columns, one column is *data* column, which is 
used to store actual data; while the other one is *version* column, which is 
used to store a incremented number. the initial value is (oldData, 0). when we 
want to update data column, we executed by CAS (key, 0, key, (newData, 1)). the 
comparison is applied only on version column, is not on data column, which is 
not expensive. 

As my knowledge, zk#setData provides a conditional set over version, the set 
operation succeeds only when the given matches the version of the znode, which 
is a kind of CAS. CAS would be better to support more K/V stores.

> Related to the previous comment, it might be a better idea to state somewhere 
> what properties we require from the backend store.

I think I have put them in section 3, the operations required by a MetaStore.

> I'm not entirely sure I understand the implementation of leader election in 
> 5.1. What happens if a hub is incorrectly suspected of crashing and it loses 
> ownership over a topic? Does it find out via session expiration? Also, I 
> suppose that if the hub has crashed but the list of hubs hasn't changed, then 
> multiple iterations of 1 may have to happen.

>> I suppose that if the hub has crashed but the list of hubs hasn't changed, 
>> then multiple iterations of 1 may have to happen.

doesn't this case exit using zookeeper? it seems that there is still a gap 
between hub crashed and znode deletion (session expired). in metastore-based 
topic manager, this gap becomes hub crashed and other hub server got notified 
about hub crashed.

>> What happens if a hub is incorrectly suspected of crashing and it loses 
>> ownership over a topic?

if a hub server is not crashed, other hub server would not receive the 
notification from zookeeper about that hub crashed (can zookeeper guarantee 
it?). so ownership would not change, since other hub server still see a same 
zxid about that hub server.
                
> Scale hedwig
> ------------
>
>                 Key: BOOKKEEPER-181
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-181
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server, hedwig-server
>            Reporter: Sijie Guo
>            Assignee: Sijie Guo
>         Attachments: hedwigscale.pdf
>
>
> Current implementation of Hedwig and BookKeeper is designed to scale to 
> hundreds of thousands of topics, but now we are looking at scaling them to 
> tens to hundreds of millions of topics, using a scalable key/value store such 
> as HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-181) Scale hedwig

Reply via email to