[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193494#comment-13193494
 ] 

Ioan Eugen Stan commented on ZOOKEEPER-580:
-------------------------------------------

+1
                
> Document reasonable limits on the size and shape of data for a zookeeper 
> ensemble.
> ----------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-580
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-580
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: documentation
>            Reporter: bryan thompson
>
> I would like to have documentation which clarifies the reasonable limits on 
> the size and shape of data in a zookeeper ensemble.  Since all zookeeper 
> nodes and their data are replicated on each peer in an ensemble, there will 
> be a machine limit on the amount of data in a zookeeper instance, but I have 
> not seen any guidance on the estimation of that machine limit.  Presumably 
> the machine limits are primarily determined by the amount of heap available 
> to the JVM before swapping sets in, however there might well be other limits 
> which are less obvious in terms of the #of children per node and the depth of 
> the node hierarchy (in addition to the already documented limit on the amount 
> of data in a node).  There may also be interactions with the hierarchy depth 
> and performance, which I have not seen detailed anywhere.
> Guidance regarding pragmatic and machine limits would be helpful is choosing 
> designs using zookeeper which can scale.  For example, if metadata about each 
> shard of a partitioned database architecture is mapped onto a distinct znode 
> in zookeeper, then there could be an very large number of znodes for a large 
> database deployment.   While this would make it easy to reassign shards to 
> services dynamically, the design might impose an unforeseen limit on the #of 
> shards in the database.  A similar concern would apply to an attempt to 
> maintain metadata about each file in a distributed file system.
> Issue [ZOOKEEPER-272] described some problems when nodes have a large number 
> #of children.  However, it did not elaborate on whether the change to an 
> Iterator model would break the atomic semantics of the List<String> of 
> children or if the Iterator would be backed by a snapshot of the children as 
> it existed at the time the iterator was requested, which would put a memory 
> burden on the ensemble.  This raises the related question of when designs 
> which work around scaling limits in zookeeper might break desirable 
> semantics, primarily the ability to have a consistent view of the distributed 
> state.
> Put another way, are there anti-patterns for zookeeper relating to 
> scalability?  Too many children?  Too much depth?  Avoid decomposing large 
> numbers of children into hierarchies?  Etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to