[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13944354#comment-13944354
 ] 

Rakesh R commented on ZOOKEEPER-1502:
-------------------------------------

Thanks [~michim] for the comments and will do 1, 3.

For the 2nd point,
bq.consider Ted's suggestion of using PID to detect stale lock files.
Few cases that comes in my mind :-
# Normal JVM termination - FileLock#release() would release the lock and 
File#deleteOnExit() would delete the lock file. Now another server can be able 
to do acquire the lock.
# Abnormal JVM exit (kill) - Here the previous 'in_use.lock' will leave as it 
is and become orphan. Now when another server comes and tries to acquire, it 
will be able to do so.
# JVM running without releasing the lock(probably embedded zkserver case) - I 
feel if this case exists, then it could be a functional issue of our lock 
implementation.
# lock is acquired by a non-zk process - In our case zk server is interested to 
acquire 'in_use.lock' file and whether we need to consider the external access 
of in_use.lock acquisition ?

Any more cases where it results in stale lock ?

AFAIK FileLock itself will help us to acquire the lock only if no other JVM 
held the lock and I feel no other special handling requires using PID checks. 
Does this sound good to you ?

> Prevent multiple zookeeper servers from using the same data directory
> ---------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1502
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1502
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: server
>    Affects Versions: 3.4.3
>            Reporter: Will Johnson
>            Assignee: Rakesh R
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-1502.patch
>
>
> We recently ran into an issue where two zookeepers servers which were a part 
> of two separate quorums were configured to use the same data directory.  
> Interestingly, the zookeeper servers did not seem to complain and both seemed 
> to work fine until one of them was restarted.  Once that happened all sort of 
> chaos ensued.  I understand that this is a misconfiguration should zookeeper 
> complain about this or do users need to protect themselves in some external 
> fashion?  Is a simple file lock enough or are there other things I should 
> take into consideration if it’s up to me to handle?  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to