[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982207#comment-16982207
 ] 

Fangmin Lv commented on ZOOKEEPER-3619:
---------------------------------------

[~randgalt] we're working on the implementation, will share the code when it's 
in a good shape.

But I can give some ideas how we're implementing this:

The follower will forward the acquire semaphore request to leader, when 
preparing txn, leader knows if the client can successfully the semaphore or 
not, and fail the request immediately if it's a acquireOrFail request, 
otherwise it creates a ephemeral node and track the liveness of client with 
global session.

It will either fail after timeout of acquire or fill with semaphore after 
others released semaphore, and leader will decide who is going to own the 
semaphore, which will notify the client through the actual ZK server it's 
connecting to.

> Implement server side semaphore API to improve the efficiency and throughput 
> of coordination 
> ---------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3619
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3619
>             Project: ZooKeeper
>          Issue Type: New Feature
>          Components: server
>    Affects Versions: 3.6.0
>            Reporter: Fangmin Lv
>            Assignee: Fangmin Lv
>            Priority: Major
>             Fix For: 3.6.0
>
>
> The design principle of ZK API is simple, flexible and general, it can meets 
> different scenarios from coordination, health member track, meta store, etc. 
> But there are some cost of this general design, which makes heavy and 
> inefficient client code for recipes like distributed and semaphore, etc.
> Currently, the general client side semaphore implementation without waiting 
> time are:
>  # client A create sequential and ephemeral node N-1
>  # client B create sequential and ephemeral node N-2
>  # client A and B query all children and see if its holding the lock node 
> with the smallest sequential id 
>  # since client A has smaller sequential id, its the semaphore owner (assume 
> semaphore value is 1)
>  # client B will delete the node, close the session, and probably try again 
> later from step 2
> All the contenders will issue 4 write (create session, create lock, delete 
> lock, close session) and 1 read (get children), which are pretty heavy and 
> not scale well.
> We actually hit this issue internally for one heavy semaphore use case, and 
> we have to create dozens of ensembles to support their traffic.
> To make the semaphore recipe more efficient, we can move the semaphore 
> implementation to server side, where leader has all the context about who'll 
> win the semaphore/lock during txn preparation time, do short circuit and fail 
> the contender directly without proposing and committing those create/delete 
> lock transactions.
> To implement this, we need to add new semaphore API, which suppose to replace 
> client side lock, leader election (semaphore value 1), and general semaphore 
> use cases.
> We started to design and implement it recently, it will based on another big 
> improvement we've almost finished and will soon upstream it in ZOOKEEPER-3594 
> to skip proposing requests with error transactions.
> Meanwhile, we'd like to hear some early feedback from the community about 
> this feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to