[ https://issues.apache.org/jira/browse/ZOOKEEPER-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Henry Robinson updated ZOOKEEPER-423: ------------------------------------- Attachment: ZOOKEEPER-423.patch [Non-garbage patch this time] This is a draft patch for this issue. I'd appreciate feedback on the approach. I've added two API methods - getFirstChild and getLastChild. I've changed DataNode to keep its child list in a TreeSet, not a HashSet. This may be controversial because operations are now logarithmic in the number of children, not O(1), but the datatree's hashmap is still used for most path-based lookups. Both API methods throw an Empty exception if there are no children. I've decided not to address the issue of *what* to sort by here, and just defaulted to sorting by node name. The API leaves the door open to specifying a sort order at the time of parent node creation (i.e. another argument to create(...)). This patch obviously lacks tests, comments and general cleanup. You should be able to use a shell to call lsfirst and lslast. Any comments on the APIs would be great - would we rather one getOrderedChild API which took a flag describing whether to get the first or last? Or a getNthChild API? > Add getFirstChild API > --------------------- > > Key: ZOOKEEPER-423 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-423 > Project: ZooKeeper > Issue Type: New Feature > Components: contrib-bindings, documentation, java client, server > Reporter: Henry Robinson > Attachments: ZOOKEEPER-423.patch > > > When building the distributed queue for my tutorial blog post, it was pointed > out to me that there's a serious inefficiency here. > Informally, the items in the queue are created as sequential nodes. For a > 'dequeue' call, all items are retrieved and sorted by name by the client in > order to find the name of the next item to try and take. This costs O( n ) > bandwidth and O(n.log n) sorting time - per dequeue call! Clearly this > doesn't scale very well. > If the servers were able to maintain a data structure that allowed them to > efficiently retrieve the children of a node in order of the zxid that created > them this would make successful dequeue operations O( 1 ) at the cost of O( n > ) memory on the server (to maintain, e.g. a singly-linked list as a queue). > This is a win if it is generally true that clients only want the first child > in creation order, rather than the whole set. > We could expose this to the client via this API: getFirstChild(handle, path, > name_buffer, watcher) which would have much the same semantics as > getChildren, but only return one znode name. > Sequential nodes would still allow the ordering of znodes to be made > explicitly available to the client in one RPC should it need it. Although: > since this ordering would now be available cheaply for every set of children, > it's not completely clear that there would be that many use cases left for > sequential nodes if this API was augmented with a getChildrenInCreationOrder > call. However, that's for a different discussion. > A halfway-house alternative with more flexibility is to add an 'order' > parameter to getFirstChild and have the server compute the first child > according to the requested order (creation time, update time, lexicographical > order). This saves bandwidth at the expense of increased server load, > although servers can be implemented to spend memory on pre-computing commonly > requested orders. I am only in favour of this approach if servers maintain a > data-structure for every possible order, and then the memory implications > need careful consideration. > [edit - JIRA interprets ( n ) without the spaces as a thumbs-down. cute.] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira