from:"Flavio Junqueira"

[
https://issues.apache.org/jira/browse/ZOOKEEPER-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932902#action_12932902
]

Flavio Junqueira commented on ZOOKEEPER-918:

Amit, just to give you an update, we have been discussing switching to a new
documentation system soon (ZOOKEEPER-925), so we were wondering if it would be
a problem waiting until we have it. Assuming the new system is easier to work
with, we can more easily introduce your notes to the release documentation.
Does it sound ok?

If we take too long, then we can rethink it and find another way, like creating
a wiki page or committing the pdf directly and linking to the BK documentation.

Review of BookKeeper Documentation (Sequence flow and failure scenarios)

Key: ZOOKEEPER-918
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-918
Project: Zookeeper
Issue Type: Task
Components: documentation
Reporter: Amit Jaiswal
Assignee: Amit Jaiswal
Priority: Minor
Fix For: 3.3.3, 3.4.0

Attachments: BookKeeperInternals.doc, BookKeeperInternals.pdf

Original Estimate: 2h
Remaining Estimate: 2h

I have prepared a document describing some of the internals of bookkeeper in
terms of:
1. Sequence of operations
2. Files layout
3. Failure scenarios
The document is prepared by mostly by reading the code. Can somebody who
understands the design review the same.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

[
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932909#action_12932909
]

Flavio Junqueira commented on ZOOKEEPER-922:

HI Camille, Say a client disconnects from server A and reconnects to server B,
same session. Server A believes the session should be expired because it
received an exception. Server B believes the session should stay alive, since
the client just reconnected. What should we do in this case? Kill the session
or not?

Our suggestion is to have an option that enables fast expiration and disables
clients moving sessions to other servers. We are certainly not proposing to
remove the second functionality from ZooKeeper altogether.

enable faster timeout of sessions in case of unexpected socket disconnect
-

Key: ZOOKEEPER-922
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
Project: Zookeeper
Issue Type: Improvement
Components: server
Reporter: Camille Fournier
Assignee: Camille Fournier
Fix For: 3.4.0

Attachments: ZOOKEEPER-922.patch

In the case when a client connection is closed due to socket error instead of
the client calling close explicitly, it would be nice to enable the session
associated with that client to time out faster than the negotiated session
timeout. This would enable a zookeeper ensemble that is acting as a dynamic
discovery provider to remove ephemeral nodes for crashed clients quickly,
while allowing for a longer heartbeat-based timeout for java clients that
need to do long stop-the-world GC.
I propose doing this by setting the timeout associated with the crashed
session to minSessionTimeout.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

[
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932974#action_12932974
]

Flavio Junqueira commented on ZOOKEEPER-900:

+1, Great job, Vishal! On your question, the problem is that it is not easy to
decide when a peer can close its connections because it doesn't know in which
state others are and it might need to receive and respond to notifications. In
any case, if have an idea for how to do it and want to discuss it further, we
could create a new jira and work there, since this is a separate issue.

FLE implementation should be improved to use non-blocking sockets
-

Key: ZOOKEEPER-900
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
Project: Zookeeper
Issue Type: Bug
Reporter: Vishal K
Assignee: Vishal K
Priority: Critical
Fix For: 3.4.0

Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1,
ZOOKEEPER-900.patch2

From earlier email exchanges:
1. Blocking connects and accepts:
a) The first problem is in manager.toSend(). This invokes connectOne(), which
does a blocking connect. While testing, I changed the code so that
connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run()
does a socketChannel.connect(). After starting AsyncConnect, connectOne
starts a timer. connectOne continues with normal operations if the connection
is established before the timer expires, otherwise, when the timer expires it
interrupts AsyncConnect() thread and returns. In this way, I can have an
upper bound on the amount of time we need to wait for connect to succeed. Of
course, this was a quick fix for my testing. Ideally, we should use Selector
to do non-blocking connects/accepts. I am planning to do that later once we
at least have a quick fix for the problem and consensus from others for the
real fix (this problem is big blocker for us). Note that it is OK to do
blocking IO in SenderWorker and RecvWorker threads since they block IO to the
respective !
peer.
b) The blocking IO problem is not just restricted to connectOne(), but also
in receiveConnection(). The Listener thread calls receiveConnection() for
each incoming connection request. receiveConnection does blocking IO to get
peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the
peer that had sent the connection request. All of this is happening from the
Listener. In short, if a peer fails after initiating a connection, the
Listener thread won't be able to accept connections from other peers, because
it would be stuck in read() or connetOne(). Also the code has an inherent
cycle. initiateConnection() and receiveConnection() will have to be very
carefully synchronized otherwise, we could run into deadlocks. This code is
going to be difficult to maintain/modify.
Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-900:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed revision 1036071.

 FLE implementation should be improved to use non-blocking sockets
 -

 Key: ZOOKEEPER-900
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
 Project: Zookeeper
  Issue Type: Bug
Reporter: Vishal K
Assignee: Vishal K
Priority: Critical
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, 
 ZOOKEEPER-900.patch2


 From earlier email exchanges:
 1. Blocking connects and accepts:
 a) The first problem is in manager.toSend(). This invokes connectOne(), which 
 does a blocking connect. While testing, I changed the code so that 
 connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() 
 does a socketChannel.connect(). After starting AsyncConnect, connectOne 
 starts a timer. connectOne continues with normal operations if the connection 
 is established before the timer expires, otherwise, when the timer expires it 
 interrupts AsyncConnect() thread and returns. In this way, I can have an 
 upper bound on the amount of time we need to wait for connect to succeed. Of 
 course, this was a quick fix for my testing. Ideally, we should use Selector 
 to do non-blocking connects/accepts. I am planning to do that later once we 
 at least have a quick fix for the problem and consensus from others for the 
 real fix (this problem is big blocker for us). Note that it is OK to do 
 blocking IO in SenderWorker and RecvWorker threads since they block IO to the 
 respective !
 peer.
 b) The blocking IO problem is not just restricted to connectOne(), but also 
 in receiveConnection(). The Listener thread calls receiveConnection() for 
 each incoming connection request. receiveConnection does blocking IO to get 
 peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the 
 peer that had sent the connection request. All of this is happening from the 
 Listener. In short, if a peer fails after initiating a connection, the 
 Listener thread won't be able to accept connections from other peers, because 
 it would be stuck in read() or connetOne(). Also the code has an inherent 
 cycle. initiateConnection() and receiveConnection() will have to be very 
 carefully synchronized otherwise, we could run into deadlocks. This code is 
 going to be difficult to maintain/modify.
 Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-902) Fix findbug issue in trunk Malicious code vulnerability


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932989#action_12932989
 ] 

Flavio Junqueira commented on ZOOKEEPER-902:


Agreed, I've seen that 900 didn't include it. I'd rather let Pat take care of 
wrapping up this issue... 

 Fix findbug issue in trunk Malicious code vulnerability
 -

 Key: ZOOKEEPER-902
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-902
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.4.0
Reporter: Patrick Hunt
Priority: Minor
 Fix For: 3.4.0


 https://hudson.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/970/artifact/trunk/findbugs/zookeeper-findbugs-report.html#Warnings_MALICIOUS_CODE
 Malicious code vulnerability Warnings
 Code  Warning
 MSorg.apache.zookeeper.server.quorum.LeaderElection.epochGen isn't final 
 but should be

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

[
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932999#action_12932999
]

Flavio Junqueira commented on ZOOKEEPER-900:

Ok, there might have been a confusion. I've seen the patch available flag up
and I interpreted it as ready to commit (after review, of course). If you still
think there is work to be done on this jira, Vishal, please consider reopening
it and creating sub-tasks. From your comments, I can extract at least 3
possible tasks.

Once you create sub-tasks (or new independent jiras), I will comment on your
questions. I'd rather do that so that we don't mix up the discussion. Is that
ok?

FLE implementation should be improved to use non-blocking sockets
-

Key: ZOOKEEPER-900
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
Project: Zookeeper
Issue Type: Bug
Reporter: Vishal K
Assignee: Vishal K
Priority: Critical
Fix For: 3.4.0

Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1,
ZOOKEEPER-900.patch2

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-933) Remove wildcard QuorumPeer.OBSERVER_ID

[
https://issues.apache.org/jira/browse/ZOOKEEPER-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933034#action_12933034
]

Flavio Junqueira commented on ZOOKEEPER-933:

Hi Vishal, The reason for the wildcard is explained in ZOOKEEPER-599. I'd
rather keep this feature for the reasons explained before, but it would be good
to prevent the case you mention.

Remove wildcard QuorumPeer.OBSERVER_ID
---

Key: ZOOKEEPER-933
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-933
Project: Zookeeper
Issue Type: Sub-task
Reporter: Vishal K
Fix For: 3.4.0

1. I have a question about the following piece of code in QCM:
if (remoteSid == QuorumPeer.OBSERVER_ID) {
/* * Choose identifier at random. We need a value to identify * the
connection. */
remoteSid = observerCounter--;
LOG.info(Setting arbitrary identifier to observer: + remoteSid);
}
Should we allow this? The problem with this code is that if a peer
connects twice with QuorumPeer.OBSERVER_ID, we will end up creating
threads for this peer twice. This could result in redundant
SendWorker/RecvWorker threads.
I haven't used observers yet. The documentation
http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html
says that just like followers, observers should have server IDs. In
which case, why do we want to provide a wild-card?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-934) Add sanity check for server ID


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12933045#action_12933045
 ] 

Flavio Junqueira commented on ZOOKEEPER-934:


It sounds like we need to do it so that we don't get affected by port scanners 
or monitoring systems. However, I'm not sure if this impacts the observers 
feature we are discussing in the other jira (ZOOKEEPER-933). It sounds like it 
does, but I need to verify. Any thoughts?



 Add sanity check for server ID
 --

 Key: ZOOKEEPER-934
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-934
 Project: Zookeeper
  Issue Type: Sub-task
Reporter: Vishal K
 Fix For: 3.4.0


 2. Should I add a check to reject connections from peers that are not
 listed in the configuration file? Currently, we are not doing any
 sanity check for server IDs. I think this might fix ZOOKEEPER-851.
 The fix is simple. However, I am not sure if anyone in community
 is relying on this ability.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-16 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932500#action_12932500
 ] 

Flavio Junqueira commented on ZOOKEEPER-922:


Hi! I'm confused by this proposal. What happens if the client disconnects form 
one server and moves to another? Or you want to be able to disable that feature 
as well?

 enable faster timeout of sessions in case of unexpected socket disconnect
 -

 Key: ZOOKEEPER-922
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Camille Fournier
Assignee: Camille Fournier
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-922.patch


 In the case when a client connection is closed due to socket error instead of 
 the client calling close explicitly, it would be nice to enable the session 
 associated with that client to time out faster than the negotiated session 
 timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
 discovery provider to remove ephemeral nodes for crashed clients quickly, 
 while allowing for a longer heartbeat-based timeout for java clients that 
 need to do long stop-the-world GC. 
 I propose doing this by setting the timeout associated with the crashed 
 session to minSessionTimeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-922) enable faster timeout of sessions in case of unexpected socket disconnect

2010-11-16 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932512#action_12932512
 ] 

Flavio Junqueira commented on ZOOKEEPER-922:


I think I understand your motivation, but I'm not sure it will work the way you 
expect it to work. I'm afraid that you might end end up getting lots of false 
positives due to delays introduced by the environment (e.g., jvm gc). Let me 
clarify one thing first: when you refer to clients crashing, are you thinking 
about the jvm crashing or the whole machine becoming unavailable? Basically my 
question is if you really expect connections to be cleanly closed or not.


 enable faster timeout of sessions in case of unexpected socket disconnect
 -

 Key: ZOOKEEPER-922
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-922
 Project: Zookeeper
  Issue Type: Improvement
  Components: server
Reporter: Camille Fournier
Assignee: Camille Fournier
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-922.patch


 In the case when a client connection is closed due to socket error instead of 
 the client calling close explicitly, it would be nice to enable the session 
 associated with that client to time out faster than the negotiated session 
 timeout. This would enable a zookeeper ensemble that is acting as a dynamic 
 discovery provider to remove ephemeral nodes for crashed clients quickly, 
 while allowing for a longer heartbeat-based timeout for java clients that 
 need to do long stop-the-world GC. 
 I propose doing this by setting the timeout associated with the crashed 
 session to minSessionTimeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

2010-11-15 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932228#action_12932228
 ] 

Flavio Junqueira commented on ZOOKEEPER-900:


If we fix the findbugs issue here, then we should just close ZOOKEEPER-902 
stating that it was resolved in ZOOKEEPER-900.

 FLE implementation should be improved to use non-blocking sockets
 -

 Key: ZOOKEEPER-900
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
 Project: Zookeeper
  Issue Type: Bug
Reporter: Vishal K
Assignee: Vishal K
Priority: Critical
 Fix For: 3.4.0

 Attachments: ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2


 From earlier email exchanges:
 1. Blocking connects and accepts:
 a) The first problem is in manager.toSend(). This invokes connectOne(), which 
 does a blocking connect. While testing, I changed the code so that 
 connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() 
 does a socketChannel.connect(). After starting AsyncConnect, connectOne 
 starts a timer. connectOne continues with normal operations if the connection 
 is established before the timer expires, otherwise, when the timer expires it 
 interrupts AsyncConnect() thread and returns. In this way, I can have an 
 upper bound on the amount of time we need to wait for connect to succeed. Of 
 course, this was a quick fix for my testing. Ideally, we should use Selector 
 to do non-blocking connects/accepts. I am planning to do that later once we 
 at least have a quick fix for the problem and consensus from others for the 
 real fix (this problem is big blocker for us). Note that it is OK to do 
 blocking IO in SenderWorker and RecvWorker threads since they block IO to the 
 respective !
 peer.
 b) The blocking IO problem is not just restricted to connectOne(), but also 
 in receiveConnection(). The Listener thread calls receiveConnection() for 
 each incoming connection request. receiveConnection does blocking IO to get 
 peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the 
 peer that had sent the connection request. All of this is happening from the 
 Listener. In short, if a peer fails after initiating a connection, the 
 Listener thread won't be able to accept connections from other peers, because 
 it would be stuck in read() or connetOne(). Also the code has an inherent 
 cycle. initiateConnection() and receiveConnection() will have to be very 
 carefully synchronized otherwise, we could run into deadlocks. This code is 
 going to be difficult to maintain/modify.
 Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

[
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931353#action_12931353
]

Flavio Junqueira commented on ZOOKEEPER-900:

Hi Vishal, This is a good question. I'm actually assuming that the behavior of
TCP is such that if I send a message and then close the channel properly
(calling close()), due to the reliability and order guarantees of the
connection, the message will get through before the connection closes.
Essentially, I'm relying upon the TCP ACK to do exactly what you're proposing.
However, it might be a good idea to make sure that the assumption is correct or
if you know the answer already, just let me know. Overall I do agree that
having an ACK is important.

FLE implementation should be improved to use non-blocking sockets
-

Key: ZOOKEEPER-900
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
Project: Zookeeper
Issue Type: Bug
Reporter: Vishal K
Assignee: Vishal K
Priority: Critical
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets

[
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931460#action_12931460
]

Flavio Junqueira commented on ZOOKEEPER-900:

That's a pretty strong statement. You're essentially suggesting that we
shouldn't rely upon TCP to implement even its basic functionality. Also, my
understanding is that Vishal is just reasoning about the code and he hasn't
been able to reproduce that situation. Please correct me if I'm mistaken,
Vishal.

FLE implementation should be improved to use non-blocking sockets
-

Key: ZOOKEEPER-900
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
Project: Zookeeper
Issue Type: Bug
Reporter: Vishal K
Assignee: Vishal K
Priority: Critical
Fix For: 3.4.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931464#action_12931464
 ] 

Flavio Junqueira commented on ZOOKEEPER-880:


Benoit, just to clarify, is this also due to monitoring or scanning?

 QuorumCnxManager$SendWorker grows without bounds
 

 Key: ZOOKEEPER-880
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.2
Reporter: Jean-Daniel Cryans
Priority: Critical
 Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
 hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack, 
 TRACE-hbase-hadoop-zookeeper-sv4borg9.log.gz


 We're seeing an issue where one server in the ensemble has a steady growing 
 number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
 out of native threads, and at the same time we see a lot of exceptions in the 
 logs.  This is on 3.2.2 and our config looks like:
 {noformat}
 tickTime=3000
 dataDir=/somewhere_thats_not_tmp
 clientPort=2181
 initLimit=10
 syncLimit=5
 server.0=sv4borg9:2888:3888
 server.1=sv4borg10:2888:3888
 server.2=sv4borg11:2888:3888
 server.3=sv4borg12:2888:3888
 server.4=sv4borg13:2888:3888
 {noformat}
 The issue is on the first server. I'm going to attach threads dumps and logs 
 in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12931470#action_12931470
 ] 

Flavio Junqueira commented on ZOOKEEPER-900:


Sure, I can investigate a little further, and Vishal let us know if you find 
anything.

 FLE implementation should be improved to use non-blocking sockets
 -

 Key: ZOOKEEPER-900
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
 Project: Zookeeper
  Issue Type: Bug
Reporter: Vishal K
Assignee: Vishal K
Priority: Critical
 Fix For: 3.4.0


 From earlier email exchanges:
 1. Blocking connects and accepts:
 a) The first problem is in manager.toSend(). This invokes connectOne(), which 
 does a blocking connect. While testing, I changed the code so that 
 connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() 
 does a socketChannel.connect(). After starting AsyncConnect, connectOne 
 starts a timer. connectOne continues with normal operations if the connection 
 is established before the timer expires, otherwise, when the timer expires it 
 interrupts AsyncConnect() thread and returns. In this way, I can have an 
 upper bound on the amount of time we need to wait for connect to succeed. Of 
 course, this was a quick fix for my testing. Ideally, we should use Selector 
 to do non-blocking connects/accepts. I am planning to do that later once we 
 at least have a quick fix for the problem and consensus from others for the 
 real fix (this problem is big blocker for us). Note that it is OK to do 
 blocking IO in SenderWorker and RecvWorker threads since they block IO to the 
 respective !
 peer.
 b) The blocking IO problem is not just restricted to connectOne(), but also 
 in receiveConnection(). The Listener thread calls receiveConnection() for 
 each incoming connection request. receiveConnection does blocking IO to get 
 peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the 
 peer that had sent the connection request. All of this is happening from the 
 Listener. In short, if a peer fails after initiating a connection, the 
 Listener thread won't be able to accept connections from other peers, because 
 it would be stuck in read() or connetOne(). Also the code has an inherent 
 cycle. initiateConnection() and receiveConnection() will have to be very 
 carefully synchronized otherwise, we could run into deadlocks. This code is 
 going to be difficult to maintain/modify.
 Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

2010-11-11 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930953#action_12930953
 ] 

Flavio Junqueira commented on ZOOKEEPER-928:


Good point, Pat. I should have remembered this, since our hack to introduce the 
connection timeout in QCM previously was through the socket directly, so it 
makes
sense that we would have to do the same for other blocking operations. In fact, 
I 
have quickly tried replacing the read call in receiveConnection with the 
following:

{noformat}
s.socket().getInputStream().read(msgBytes);
{noformat}

and I get a SocketTimeoutException after the especified timeout. 

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical

 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-909) Extract NIO specific code from ClientCnxn


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930546#action_12930546
 ] 

Flavio Junqueira commented on ZOOKEEPER-909:


Thomas, Check the console output on hudson, close to the end of the page. The 
failure seems to be on the C tests.

 Extract NIO specific code from ClientCnxn
 -

 Key: ZOOKEEPER-909
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-909
 Project: Zookeeper
  Issue Type: Sub-task
  Components: java client
Reporter: Thomas Koch
Assignee: Thomas Koch
 Fix For: 3.4.0

 Attachments: ClientCnxnSocketNetty.java, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, ZOOKEEPER-909.patch, 
 ZOOKEEPER-909.patch


 This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus 
 everything Netty related. This means this patch only extract all NIO specific 
 code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket.
 I've redone this patch from current trunk step by step now and couldn't find 
 any logical error. I've already done a couple of successful test runs and 
 will continue to do so this night.
 It would be nice, if we could apply this patch as soon as possible to trunk. 
 This allows us to continue to work on the netty integration without blocking 
 the ClientCnxn class. Adding Netty after this patch should be only a matter 
 of adding the ClientCnxnSocketNetty class with the appropriate test cases.
 You could help me by reviewing the patch and by running it on whatever test 
 server you have available. Please send me any complete failure log you should 
 encounter to thomas at koch point ro. Thx!
 Update: Until now, I've collected 8 successful builds in a row!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930774#action_12930774
 ] 

Flavio Junqueira commented on ZOOKEEPER-928:


I've just seen the messages on zookeeper-dev, and I'm not sure this is right:

# readPacket is implemented in Learner.java, and the socket read is performed 
in this line: leaderIs.readRecord(pp, packet);
# leaderIs is an InputArchive instance instantiated in Learner:connectToLeader;
# The socket used to instantiate leaderIs has its SO_TIMEOUT value set right 
before in connectToLeader: sock.setSoTimeout(self.tickTime * self.initLimit).

Consequently, the operation should not be delayed indefinitely and should 
return after self.tickTime * self.initLimit. This discussion on SO_TIMEOUT 
sounds familiar, huh? ;-)

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical
 Fix For: 3.3.3, 3.4.0


 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930788#action_12930788
 ] 

Flavio Junqueira commented on ZOOKEEPER-928:


Hi Vishal, My understanding is that the readRecord call in readPacket will 
timeout, even if the TCP connection is still up. The documentation in: 
http://download.oracle.com/javase/6/docs/api/java/net/SocketOptions.html

says that:
{noformat}
static int  SO_TIMEOUT
  Set a timeout on blocking Socket operations:
{noformat}

 Follower should stop following and start FLE if it does not receive pings 
 from the leader
 -

 Key: ZOOKEEPER-928
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical
 Fix For: 3.3.3, 3.4.0


 In Follower.followLeader() after syncing with the leader, the follower does:
 while (self.isRunning()) {
 readPacket(qp);
 processPacket(qp);
 }
 It looks like it relies on socket timeout expiry to figure out if the 
 connection with the leader has gone down.  So a follower *with no cilents* 
 may never notice a faulty leader if a Leader has a software hang, but the TCP 
 connections with the peers are still valid. Since it has no cilents, it won't 
 hearbeat with the Leader. If majority of followers are not connected to any 
 clients, then FLE will fail even if other followers attempt to elect a new 
 leader.
 We should keep track of pings received from the leader and see if we havent 
 seen
 a ping packet from the leader for (syncLimit * tickTime) time and give up 
 following the
 leader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

[
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930800#action_12930800
]

Flavio Junqueira commented on ZOOKEEPER-928:

My understanding is that SO_TIMEOUT also affects SocketChannel, since it builds
on top of a Socket object.

Follower should stop following and start FLE if it does not receive pings
from the leader
-

Key: ZOOKEEPER-928
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
Project: Zookeeper
Issue Type: Bug
Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical

In Follower.followLeader() after syncing with the leader, the follower does:
while (self.isRunning()) {
readPacket(qp);
processPacket(qp);
}
It looks like it relies on socket timeout expiry to figure out if the
connection with the leader has gone down. So a follower *with no cilents*
may never notice a faulty leader if a Leader has a software hang, but the TCP
connections with the peers are still valid. Since it has no cilents, it won't
hearbeat with the Leader. If majority of followers are not connected to any
clients, then FLE will fail even if other followers attempt to elect a new
leader.
We should keep track of pings received from the leader and see if we havent
seen
a ping packet from the leader for (syncLimit * tickTime) time and give up
following the
leader.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-928) Follower should stop following and start FLE if it does not receive pings from the leader

[
https://issues.apache.org/jira/browse/ZOOKEEPER-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930851#action_12930851
]

Flavio Junqueira commented on ZOOKEEPER-928:

The documentation refers to SocketInputStream.read(), but it doesn't mention
SocketChannel.read(). I ran a quick test with QuorumCnxManager and it doesn't
seem to work. So maybe it is true that setting SO_TIMEOUT has no effect on
SocketChannel.read(), which is kind of surprising to me.

Follower should stop following and start FLE if it does not receive pings
from the leader
-

Key: ZOOKEEPER-928
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-928
Project: Zookeeper
Issue Type: Bug
Components: quorum, server
Affects Versions: 3.3.2
Reporter: Vishal K
Priority: Critical

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [VOTE] Release ZooKeeper 3.3.2 (candidate 0)

2010-11-10 Thread Flavio Junqueira

+1, unit tests pass. Also ran a few manual tests. I must say that in one of the computers I tried, AsyncHammerTest fails, and the error message I get is that there are no tests. Discussing with Pat, we ended up concluding that it is most likely a configuration problem. I don't think that's a reason to -1 it, though.-FlavioOn Nov 11, 2010, at 12:24 AM, Henry Robinson wrote:+1Python looks good.On 10 November 2010 14:51, Michi Mutsuzaki mic...@yahoo-inc.com wrote:+1.I ran my benchmark test on the release candidate for one hour, and gotsimilar numbers as 3.3.0.--MichiOn 11/10/10 11:09 AM, "Mahadev Konar" maha...@yahoo-inc.com wrote:+1 for the release.Ran ant test and a couple of smoke tests. Create znodes and shutdownzookeeper servers to test durability. Deleted znodes to make sure theyaredeleted. Shot down servers one at a time to confirm correct behavior.ThanksmahadevOn 11/4/10 11:17 PM, "Patrick Hunt" ph...@apache.org wrote:I've created a candidate build for ZooKeeper 3.3.2. This is a bug fixrelease addressing twenty-six issues (eight critical) -- see therelease notes for details.*** Please download, test and VOTE before the*** vote closes 11pm pacific time, Tuesday, November 9.***http://people.apache.org/~phunt/zookeeper-3.3.2-candidate-0/Should we release this?Patrick-- Henry RobinsonSoftware EngineerCloudera415-994-6679 flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301

[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation

2010-11-09 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930201#action_12930201
]

Flavio Junqueira commented on ZOOKEEPER-925:

I'm fine with moving to a different doc system and having our own lookfeel,
but my main concern is having a doc generation that is relatively easy to use.
If it is difficult to use, then contributors won't feel very motivated to write
documentation... It would be great to get folks to stop whining when they have
to write documentation, and stop blaming Forrest. :-)

To be fair, I must say that my experience with Forrest hasn't been great.
Having to insert tags by hand and not being able to find descriptions for tags
easily made it hard for me to like Forrest. The output looks good for me,
though.

Consider maven site generation to replace our forrest site and documentation
generation
---

Key: ZOOKEEPER-925
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925
Project: Zookeeper
Issue Type: Wish
Components: documentation
Reporter: Patrick Hunt

See WHIRR-19 for some background.
In whirr we looked at a number of site/doc generation facilities. In the end
Maven site generation plugin turned out to be by far the best option. You can
see our nascent site here (no attempt at styling,etc so far):
http://incubator.apache.org/whirr/
In particular take a look at the quick start:
http://incubator.apache.org/whirr/quick-start-guide.html
which was generated from
http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence
notice this was standard wiki markup (confluence wiki markup, same as
available from apache)
You can read more about mvn site plugin here:
http://maven.apache.org/guides/mini/guide-site.html
Notice that other formats are available, not just confluence markup, also
note that you can use different markup formats if you like in the same site
(although probably not a great idea, but in some cases might be handy, for
example whirr uses the confluence wiki, so we can pretty much copy/paste
source docs from wiki to our site (svn) if we like)
Re maven vs our current ant based build. It's probably a good idea for us to
move the build to maven at some point. We could initially move just the doc
generation, and then incrementally move functionality from build.xml to mvn
over a longer time period.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation

2010-11-09 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930214#action_12930214
 ] 

Flavio Junqueira commented on ZOOKEEPER-925:


I was wondering if by getting away from checking in generated docs, you mean 
that anyone should be able to come and change docs freely.

 Consider maven site generation to replace our forrest site and documentation 
 generation
 ---

 Key: ZOOKEEPER-925
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925
 Project: Zookeeper
  Issue Type: Wish
  Components: documentation
Reporter: Patrick Hunt

 See WHIRR-19 for some background.
 In whirr we looked at a number of site/doc generation facilities. In the end 
 Maven site generation plugin turned out to be by far the best option. You can 
 see our nascent site here (no attempt at styling,etc so far):
 http://incubator.apache.org/whirr/
 In particular take a look at the quick start:
 http://incubator.apache.org/whirr/quick-start-guide.html
 which was generated from
 http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence
 notice this was standard wiki markup (confluence wiki markup, same as 
 available from apache)
 You can read more about mvn site plugin here:
 http://maven.apache.org/guides/mini/guide-site.html
 Notice that other formats are available, not just confluence markup, also 
 note that you can use different markup formats if you like in the same site 
 (although probably not a great idea, but in some cases might be handy, for 
 example whirr uses the confluence wiki, so we can pretty much copy/paste 
 source docs from wiki to our site (svn) if we like)
 Re maven vs our current ant based build. It's probably a good idea for us to 
 move the build to maven at some point. We could initially move just the doc 
 generation, and then incrementally move functionality from build.xml to mvn 
 over a longer time period.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-914) QuorumCnxManager blocks forever

2010-11-08 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929675#action_12929675
]

Flavio Junqueira commented on ZOOKEEPER-914:

Hi Vishal, The Socket documentation does sound ambiguous, but my understanding
is that SO_TIMEOUT is for blocking mode, not non-blocking mode. Non-blocking
calls return immediately, so they shouldn't need a timeout value, no?
Independent of using it or not, I would be curious to learn if my understanding
is incorrect.

About the release to include the fix, I think Mahdev later came and changed it
to 3.3.3. It is fine with me, and we just need to check what the schedule for
3.3.3 is. My preference is to work directly on ZOOKEEPER-900 (or 901, which I
think might be a more significant change), if you think we can produce a patch
in time for 3.3.3.

QuorumCnxManager blocks forever

Key: ZOOKEEPER-914
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914
Project: Zookeeper
Issue Type: Bug
Components: leaderElection
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
Fix For: 3.3.3, 3.4.0

This was a disaster. While testing our application we ran into a scenario
where a rebooted follower could not join the cluster. Further debugging
showed that the follower could not join because the QuorumCnxManager on the
leader was blocked for indefinite amount of time in receiveConnect()
Thread-3 prio=10 tid=0x7fa920005800 nid=0x11bb runnable
[0x7fa9275ed000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:206)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
- locked 0x7fa93315f988 (a java.lang.Object)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501)
I had pointed out this bug along with several other problems in
QuorumCnxManager earlier in
https://issues.apache.org/jira/browse/ZOOKEEPER-900 and
https://issues.apache.org/jira/browse/ZOOKEEPER-822.
I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix
and a patch will be out soon.
The problem is that QuorumCnxManager is using SocketChannel in blocking mode.
It does a read() in receiveConnection() and a write() in initiateConnection().
Sorry, but this is really bad programming. Also, points out to lack of
failure tests for QuorumCnxManager.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

2010-11-08 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929678#action_12929678
]

Flavio Junqueira commented on ZOOKEEPER-917:

Hi Vishal, There is possibly a misunderstanding here. Server 2 reported in this
jira (the leader) does not go back to an earlier epoch, but the other two do,
and they are following, so if I understand your argument correctly, the
exception is being applied as you suggest.

Leader election selected incorrect leader
-

Key: ZOOKEEPER-917
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
Project: Zookeeper
Issue Type: Bug
Components: leaderElection, server
Affects Versions: 3.2.2
Environment: Cloudera distribution of zookeeper (patched to never
cache DNS entries)
Debian lenny
Reporter: Alexandre Hardy
Priority: Critical
Fix For: 3.3.3, 3.4.0

Attachments: zklogs-20101102144159SAST.tar.gz

We had three nodes running zookeeper:
* 192.168.130.10
* 192.168.130.11
* 192.168.130.14
192.168.130.11 failed, and was replaced by a new node 192.168.130.13
(automated startup). The new node had not participated in any zookeeper
quorum previously. The node 192.148.130.11 was permanently removed from
service and could not contribute to the quorum any further (powered off).
DNS entries were updated for the new node to allow all the zookeeper servers
to find the new node.
The new node 192.168.130.13 was selected as the LEADER, despite the fact that
it had not seen the latest zxid.
This particular problem has not been verified with later versions of
zookeeper, and no attempt has been made to reproduce this problem as yet.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-925) Consider maven site generation to replace our forrest site and documentation generation

2010-11-08 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929988#action_12929988
 ] 

Flavio Junqueira commented on ZOOKEEPER-925:


Pat, Any thoughts on how it would be to port from Forrest to Maven site 
generation?

 Consider maven site generation to replace our forrest site and documentation 
 generation
 ---

 Key: ZOOKEEPER-925
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-925
 Project: Zookeeper
  Issue Type: Wish
  Components: documentation
Reporter: Patrick Hunt

 See WHIRR-19 for some background.
 In whirr we looked at a number of site/doc generation facilities. In the end 
 Maven site generation plugin turned out to be by far the best option. You can 
 see our nascent site here (no attempt at styling,etc so far):
 http://incubator.apache.org/whirr/
 In particular take a look at the quick start:
 http://incubator.apache.org/whirr/quick-start-guide.html
 which was generated from
 http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence
 notice this was standard wiki markup (confluence wiki markup, same as 
 available from apache)
 You can read more about mvn site plugin here:
 http://maven.apache.org/guides/mini/guide-site.html
 Notice that other formats are available, not just confluence markup, also 
 note that you can use different markup formats if you like in the same site 
 (although probably not a great idea, but in some cases might be handy, for 
 example whirr uses the confluence wiki, so we can pretty much copy/paste 
 source docs from wiki to our site (svn) if we like)
 Re maven vs our current ant based build. It's probably a good idea for us to 
 move the build to maven at some point. We could initially move just the doc 
 generation, and then incrementally move functionality from build.xml to mvn 
 over a longer time period.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-914) QuorumCnxManager blocks forever

2010-11-07 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929345#action_12929345
]

Flavio Junqueira commented on ZOOKEEPER-914:

Hi Vishal, I also appreciate your contributions and your comments. I also
understand your frustration when you find issues with the code, but think that
it is possibly equally frustrating for the developer who thought that at least
basic issues were covered, so please try to think that we don't introduce bugs
on purpose (at least I don't) and our review process is not perfect.

Regarding clover reports, we have agreed already that code coverage is not
bulletproof, and in fact there has been several other metrics proposed in the
scientific literature, but it does indicate that some call path including a
give piece of code was exercised. It certainly doesn't measure more complex
cases, like race conditions, crashes and so on. In fact, if you have a better
way of measuring test coverage, I'd happy to hear about it.

I'm not sure if you agree, but it seems to me that we should close this jira
because the technical discussion here seems to be similar to the one of
ZOOKEEPER-900. I'll try to address the concerns you raised regardless of what
will happen to this jira:

# My point about SO_TIMEOUT comes from here:
http://download.oracle.com/javase/6/docs/api/java/net/Socket.html#setSoTimeout%28int%29
# I obviously prefer to go with real fixes instead of hacking, but we need to
have release 3.3.2 out, and it sounded like introducing a configurable timeout
would fix your problem until the next release;
# About testing beyond the handshake, I'm not sure what you're proposing. If
the blocking calls are part of the handshake and this is what is failing for
you, then this is what we should target now, no?

QuorumCnxManager blocks forever

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

2010-11-07 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929354#action_12929354
]

Flavio Junqueira commented on ZOOKEEPER-917:

Hi Vishal, It is certainly understand not having dedicated development time
being an issue. I actually didn't know you're interested in the cluster
membership... I'm glad to hear, though.

On your questions:
# Suppose we have an ensemble comprising 3 servers: A, B, and C. Now suppose
that C is the leader, and both A and B follow C. If A disconnects from C for
whatever reason (e.g., network partition) and it tries to elect a leader, it
won't get any other process in the LOOKING state. It will actually receive a
notification from C saying that it is leading and one from B saying that it is
following C, both with an earlier leader election epoch. To avoid having A
locked out (not able to elect C as leader), we implemented this exception: a
process accepts going back to an earlier leader election only if it receives a
notification from the leader saying that it is leading and from a quorum saying
that it is following;
# I'm not sure if you referring to specific problem of this jira or if you are
asking about my hypothetical example. Assuming it is the former, the follower
(Follower:followLeader()) checks if the leader is proposing an earlier epoch,
and if not, it accepts the leader snapshot. Because the epoch is the same, all
followers will accept the leader snapshot follow it.

Leader election selected incorrect leader
-

Attachments: zklogs-20101102144159SAST.tar.gz

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-918) Review of BookKeeper Documentation (Sequence flow and failure scenarios)

2010-11-05 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928566#action_12928566
 ] 

Flavio Junqueira commented on ZOOKEEPER-918:


This is really nice, Amit, thanks. I haven't had a chance to go carefully over 
the document, but my first reaction is that this should be a live document, and 
perhaps a wiki page would suit this purpose well. What do you think?

 Review of BookKeeper Documentation (Sequence flow and failure scenarios)
 

 Key: ZOOKEEPER-918
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-918
 Project: Zookeeper
  Issue Type: Task
  Components: documentation
Reporter: Amit Jaiswal
Priority: Trivial
 Fix For: 3.3.3, 3.4.0

 Attachments: BookKeeperInternals.pdf

   Original Estimate: 2h
  Remaining Estimate: 2h

 I have prepared a document describing some of the internals of bookkeeper in 
 terms of:
 1. Sequence of operations
 2. Files layout
 3. Failure scenarios
 The document is prepared by mostly by reading the code. Can somebody who 
 understands the design review the same.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

2010-11-04 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928171#action_12928171
]

Flavio Junqueira commented on ZOOKEEPER-917:

The program I was using to open your logs was hiding some of the messages for
some reason unknown to me. I now understand why the leader was elected in your
case and the behavior is legitimate. Let me try to explain.

We currently repeat the last notification sent to a given server upon
reconnecting to it. This is to avoid problems with messages partially sent,
and, assuming no further bugs, the protocol is resilient to messages
duplicates. At the same time, a server A decides to follow another server B if
it receives a message from B saying that B is leading and from a quorum saying
that they are following, even if A is in a later election epoch. This mechanism
is there to avoid A being locked out of the ensemble in the case it partitions
away and comes back later.

From you logs, what happens is:

# Fresh server 2 receives previous notifications from 0 and 1, and decide to
lead;
# Server 1 receives the last message from server 0 saying that it is following
2 (which was the previous leader), and the notification from 2 saying that it
is leading. Server 1 consequently decides to follow 2;
# Server 0 receives the last message from server 1 saying that it is following
2 (which was the previous leader), and the notification from 2 saying that it
is leading. Server 0 consequently decides to follow 2.

Now the main problem I see is that the followers accept the snapshot from the
leader, and they shouldn't given that they have moved to a later epoch. I
suspect that we currently allow a server to come back to an epoch it has been
in the past to again avoid having a server locked out after being partitioned
away and healing, but I need to do some further inspection.

My overall take is that your case is unfortunately not legitimate, meaning that
we don't currently provision for configuration changes. The case you expose in
general constitutes a loss of quorum, and that violates one of our core
assumptions. In more detail, a quorum supporting a leader must have a non-empty
intersection with the quorum of servers that have accepted requests in the
previous epoch. Wiping out the state of server 2, by replacing it with a fresh
server, leads to the situation in which just one server contains all
transactions accepted by a quorum (and possibly committed). If you hadn't
replaced server 2 with a fresh server, then either server 2 would have been
elected again just the same, and it would be fine because it was previously the
leader, or it wouldn't have been elected because the leader was previously
another server and the last notifications of 0 and 1 would be supporting a
different server.

On reconfigurations, we have talked about it
(http://wiki.apache.org/hadoop/ZooKeeper/ClusterMembership), but we haven't
made enough progress recently and it is currently not implemented. It would be
great to get some help here.

Let me know if this analysis makes any sense to you, please.

Leader election selected incorrect leader
-

Attachments: zklogs-20101102144159SAST.tar.gz

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

2010-11-04 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928179#action_12928179
]

Flavio Junqueira commented on ZOOKEEPER-917:

Hi Alexandre, It is an key premise of important replication algorithms, like
Paxos, that there is a portion of the state that persists across crashes (and
recoveries). By replacing server 2 with a fresh server, you simply got rid of
the persistent state. In general, making that replacement you've made may lead
you to trouble due to the problem I exposed a few postings up. Of course, if
you wait for a successful election, the problem is supposed to go away because
you have reestablished a quorum and this quorum does not contain the faulty
server, but then you have to make sure the election happens before you
introduce the fresh server perhaps through jmx or by inspecting the logs.
Simply setting a reasonable timeout will work in most cases, but the leader
election is not guaranteed to succeed, and there is a chance, likely to be
small, that you'll end up with a corrupt state.

Leader election selected incorrect leader
-

Attachments: zklogs-20101102144159SAST.tar.gz

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (ZOOKEEPER-917) Leader election selected incorrect leader

2010-11-04 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Flavio Junqueira resolved ZOOKEEPER-917.

Resolution: Not A Problem

My pleasure to help. I'm marking it as not a problem for now, but feel free to
come back and ask for more clarification if needed.

Leader election selected incorrect leader
-

Attachments: zklogs-20101102144159SAST.tar.gz

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927869#action_12927869
 ] 

Flavio Junqueira commented on ZOOKEEPER-917:


Hi Alexandre, Could you please post your configuration parameters?

I noticed the following in both excerpts:
{noformat}
INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, 
-1, 1, 2, LOOKING, LOOKING, 1
INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2, 
-1, 1, 2, LOOKING, LOOKING, 2
{noformat}

which implies that both servers, 1 and 2, were starting from scratch and in an 
ensemble of 3 servers they form a quorum.

 Leader election selected incorrect leader
 -

 Key: ZOOKEEPER-917
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-917
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.2.2
 Environment: Cloudera distribution of zookeeper (patched to never 
 cache DNS entries)
 Debian lenny
Reporter: Alexandre Hardy
Priority: Critical
 Attachments: zklogs-20101102144159SAST.tar.gz


 We had three nodes running zookeeper:
   * 192.168.130.10
   * 192.168.130.11
   * 192.168.130.14
 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 
 (automated startup). The new node had not participated in any zookeeper 
 quorum previously. The node 192.148.130.11 was permanently removed from 
 service and could not contribute to the quorum any further (powered off).
 DNS entries were updated for the new node to allow all the zookeeper servers 
 to find the new node.
 The new node 192.168.130.13 was selected as the LEADER, despite the fact that 
 it had not seen the latest zxid.
 This particular problem has not been verified with later versions of 
 zookeeper, and no attempt has been made to reproduce this problem as yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

[
https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927908#action_12927908
]

Flavio Junqueira commented on ZOOKEEPER-917:

I downloaded your logs, but the out files are empty and I couldn't find the
notification messages. By looking at the excerpts you posted, it sounds like
node 1 tells 0 that it is following 2 and node says that it is following (this
is fine as node 2 might have received some old messages), so node 0 must follow
2. Now the question is why node 1 decided to follow 2, specially because it has
a higher zxid and the follower code should have rejected an attempt to follow a
leader from an earlier epoch.

It would be nice to have a look at the output of node 1.

Leader election selected incorrect leader
-

Attachments: zklogs-20101102144159SAST.tar.gz

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-882) Startup loads last transaction from snapshot


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927909#action_12927909
 ] 

Flavio Junqueira commented on ZOOKEEPER-882:


If I understand correctly what you're proposing, I think it won't be necessary 
to submit two separate patches. To verify that the test fails without the 
patch, I can simply add the test without applying any other modification in the 
patch file, and then run the test. After applying the modifications to the code 
base, I'd be able to verify that the test does not fail any longer. Does it 
sound right to you?

 Startup loads last transaction from snapshot
 

 Key: ZOOKEEPER-882
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-882
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Minor
 Fix For: 3.4.0

 Attachments: 882.diff, restore, ZOOKEEPER-882.patch


 On startup, the server first loads the latest snapshot, and then loads from 
 the log starting at the last transaction in the snapshot.  It should begin 
 from one past that last transaction in the log.  I will attach a possible 
 patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-917) Leader election selected incorrect leader

[
https://issues.apache.org/jira/browse/ZOOKEEPER-917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927949#action_12927949
]

Flavio Junqueira commented on ZOOKEEPER-917:

Even though the logs do not make a lot of sense for me at this point, I was
thinking that your scenario is not supposed to work given our guarantees. Let's
look at an example.

Suppose we have 3 servers: A, B, and C. Suppose that C is initially the leader
and proposes operations that B is able to ack, but A doesn't. Now, suppose that
I come and replace C with a fresh server, same id but empty state, and I do it
before A and B are able to elect a new leader and recover. In this case, A and
C may form a quorum and the state of the ZooKeeper ensemble would be empty. The
replacement of server C with a fresh server violates our assumptions.

It should work, though, if you add a fresh server with a working ensemble. That
is, you let A and B elect a new leader, and then you start the new C server. In
your case, I'm still not sure why it happens because the initial zxid of node 1
is 4294967742 according to your excerpt.

Leader election selected incorrect leader
-

Attachments: zklogs-20101102144159SAST.tar.gz

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-876) Unnecessary snapshot transfers between new leader and followers

2010-11-02 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Flavio Junqueira updated ZOOKEEPER-876:
---

Status: Open (was: Patch Available)

This is a nice catch, Diogo, and the patch looks good to me. I have a few very
quick comments:

# Instead of returning a pair of longs in startForwarding, we could simply
return maxZxid and read lastProposed directly from the leader object. Doesn't
it work?
# The first comment of startForwarding is not saying much. Could you please
expand it?
# Could you please explain in the beginning of the test case what it is
supposed to be testing? It is for later remembering what the test does.

Good job!

Unnecessary snapshot transfers between new leader and followers
---

Key: ZOOKEEPER-876
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-876
Project: Zookeeper
Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Diogo
Assignee: Diogo
Priority: Minor
Fix For: 3.4.0

Attachments: ZOOKEEPER-876.patch

When starting a new leadership, unnecessary snapshot transfers happen between
new leader and followers. This is so because of multiple small bugs.
1) the comparison of zxids is done based on a new proposal, instead of the
last logged zxid. (LearnerFollower.java:310)
2) if follower is one zxid behind, the check of the interval of committed
logs excludes the follower. (LearnerFollower.java:269)
3) the bug reported in ZOOKEEPER-874 (commitLogs are empty after recover).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-882) Startup loads last transaction from snapshot

2010-11-02 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-882:
---

Status: Open  (was: Patch Available)

Hi Jared, I was wondering if you can add a test case to your patch.

 Startup loads last transaction from snapshot
 

 Key: ZOOKEEPER-882
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-882
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Minor
 Fix For: 3.4.0

 Attachments: 882.diff, restore, ZOOKEEPER-882.patch


 On startup, the server first loads the latest snapshot, and then loads from 
 the log starting at the last transaction in the snapshot.  It should begin 
 from one past that last transaction in the log.  I will attach a possible 
 patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-11-02 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927657#action_12927657
 ] 

Flavio Junqueira commented on ZOOKEEPER-702:


Thanks, Abmar. It looks good to me. I have one quick comment, though. Is there 
any configuration value that could be causing tests to run slower? I have the 
impression that tests are running slightly slower with your patch. One in 
particular that called my attention was QuorumZxidSyncTest:

{noformat}

Trunk: [junit] Running org.apache.zookeeper.test.QuorumZxidSyncTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 94.55 sec

702: [junit] Running org.apache.zookeeper.test.QuorumZxidSyncTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 139.985 sec
{noformat}

and this seems to be pretty consistent.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Fix For: 3.4.0

 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-914) QuorumCnxManager blocks forever


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-914:
---

Component/s: (was: server)
 (was: quorum)
 leaderElection

 QuorumCnxManager blocks forever 
 

 Key: ZOOKEEPER-914
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-914
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.3, 3.4.0


 This was a disaster. While testing our application we ran into a scenario 
 where a rebooted follower could not join the cluster. Further debugging 
 showed that the follower could not join because the QuorumCnxManager on the 
 leader was blocked for indefinite amount of time in receiveConnect()
 Thread-3 prio=10 tid=0x7fa920005800 nid=0x11bb runnable 
 [0x7fa9275ed000]
java.lang.Thread.State: RUNNABLE
 at sun.nio.ch.FileDispatcher.read0(Native Method)
 at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
 at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
 at sun.nio.ch.IOUtil.read(IOUtil.java:206)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
 - locked 0x7fa93315f988 (a java.lang.Object)
 at 
 org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210)
 at 
 org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501)
 I had pointed out this bug along with several other problems in 
 QuorumCnxManager earlier in 
 https://issues.apache.org/jira/browse/ZOOKEEPER-900 and 
 https://issues.apache.org/jira/browse/ZOOKEEPER-822.
 I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix 
 and a patch will be out soon. 
 The problem is that QuorumCnxManager is using SocketChannel in blocking mode. 
 It does a read() in receiveConnection() and a write() in initiateConnection().
 Sorry, but this is really bad programming. Also, points out to lack of 
 failure tests for QuorumCnxManager.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-914) QuorumCnxManager blocks forever

[
https://issues.apache.org/jira/browse/ZOOKEEPER-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925746#action_12925746
]

Flavio Junqueira commented on ZOOKEEPER-914:

As Pat, I would also appreciate some more constructive comments (and behavior).

From the Clover reports, we exercise a significant part of the QCM code, but
it is true, though, that we don't test the cases you have been exposing. Here
is a way I believe we can reproduce this problem (I haven't implemented it,
but seems to make sense). The high-level idea is to make sure that if some
server stops responding before it completes the handshake protocol, then no
instance of QCM across all servers will block and prevent other servers from
joining the ensemble.

Suppose we configure an ensemble with 5 servers using QuorumBase. One of the
servers will be a simple mock server, as we do in the CnxManagerTest tests. Now
here is the sequence of steps to follow:

# Start three of the servers and confirm that they accept and execute
operations;
# Start mock server and execute the protocol partially. For the read case you
mention, you can simply not send the server identifier. That will cause the
read on the other end to block and to not accept more connections;
# Start a 5th server and check if it is able to join the ensemble.

A simple fix to have it working for you soon along the lines of what we have
done to make the connection timeout configurable seems to be to set SO_TIMEOUT.
But, if you have other ideas, please lay them out. Please bear in mind that the
major modifications we should leave for ZOOKEEPER-901 because those will take
more time to develop and get into shape.

QuorumCnxManager blocks forever

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

[
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925754#action_12925754
]

Flavio Junqueira commented on ZOOKEEPER-885:

Sure, let's discuss over e-mail and we can post here later our findings.

Zookeeper drops connections under moderate IO load
--

Key: ZOOKEEPER-885
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
Project: Zookeeper
Issue Type: Bug
Components: server
Affects Versions: 3.2.2, 3.3.1
Environment: Debian (Lenny)
1Gb RAM
swap disabled
100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
Attachments: benchmark.csv, tracezklogs.tar.gz, tracezklogs.tar.gz,
WatcherTest.java, zklogs.tar.gz

A zookeeper server under minimum load, with a number of clients watching
exactly one node will fail to maintain the connection when the machine is
subjected to moderate IO load.
In a specific test example we had three zookeeper servers running on
dedicated machines with 45 clients connected, watching exactly one node. The
clients would disconnect after moderate load was added to each of the
zookeeper servers with the command:
{noformat}
dd if=/dev/urandom of=/dev/mapper/nimbula-test
{noformat}
The {{dd}} command transferred data at a rate of about 4Mb/s.
The same thing happens with
{noformat}
dd if=/dev/zero of=/dev/mapper/nimbula-test
{noformat}
It seems strange that such a moderate load should cause instability in the
connection.
Very few other processes were running, the machines were setup to test the
connection instability we have experienced. Clients performed no other read
or mutation operations.
Although the documents state that minimal competing IO load should present on
the zookeeper server, it seems reasonable that moderate IO should not cause
problems in this case.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-702:
---

Status: Open  (was: Patch Available)

Hi Abmar, Thanks for the addition to the patch. I was wondering if it is really 
a good idea to have both options, normal and exponential, implemented. Since 
your experiments have shown that exponential performs better, why don't use it 
only? Also, I was wondering if you have posted expertimental numbers showing 
that exponential performs better. 

In the case we go with exponential only, then we don't need the modification to 
ivy.xml, right?

And last comment, it doesn't look like the classes implementing 
PhiTimeoutEvaluator need to be public. Is this right?  

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Fix For: 3.4.0

 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Allowing a ZooKeeper server to be part of multiple clusters

2010-10-26 Thread Flavio Junqueira

That proposal came in the context of federated zookeeper, and the motivation at the time was to use multiple overlapping clusters to enable increasing write throughput as we increase the number of servers. To my knowledge, we haven't made any progress on the implementation of such a feature.I'd be curious to understand what scenario Vishal envision for such a 2-node cluster feature. If it is not federated, then we would have trouble with ZooKeeper because we rely upon one single leader to generate state updates. In the federated case, there is one leader (perhaps multiple during non-overlapping periods of time) for each partition.There is this wiki page I have written a while back: http://wiki.apache.org/hadoop/ZooKeeper/PartitionedZookeeperHope it helps.-FlavioOn Oct 25, 2010, at 11:24 PM, Vishal K wrote:Hi Mahadev,It lets one run multiple 2-node clusters. Suppose I have an application thatdoes a simple 2-way mirroring of my data and uses ZK for clustering. If Ineed to support many 2-node clusters, where will I find the spare machinesto run the third instance for each cluster?-VishalOn Mon, Oct 25, 2010 at 5:14 PM, Mahadev Konar maha...@yahoo-inc.comwrote:Hi Vishal, This idea (2.) had been kicked around intially by Flavio. I think he¹llprobably chip in on the discussion. I am just curious on the whats the ideabehind your proposal? Is this to provide some kind of failure gauranteesbetween a 2 node and 3 node cluster?ThanksmahadevOn 10/25/10 1:05 PM, "Vishal K" vishalm...@gmail.com wrote:Hi All,I am thinking about the choices one would have to support multiple 2-nodeclusters. Assume that for some reason one needs to support multiple2-nodeclusters.This would mean they will have to figure out a way to run a thirdinstanceof ZK server for each cluster somewhere to ensure that a ZK cluster isavailable after a failure.This works well if we have to run one or two 2-node clusters. However,whatif we have to run many 2-node clusters?I have following options:1. Find m machines to run the third instance of each cluster. Run n/minstances of ZK on each machine.2. Modify ZooKeeper server to participate in multiple clusters. This willallow us to run y instances of third node where each instance will bepartof n/y clusters.3. Run the third instance of ZK server required for the ith cluster ononeof the server on (i+1)%n cluster. Essentially, distribute the thirdinstanceacross the other clusters.The pros and cons of each approach are fairly obvious. While I prefer thethird approach, I would like to check what everyone thinks about thesecondapproach.Thanks.-Vishal flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301

Re: [VOTE] ZooKeeper as TLP?

2010-10-23 Thread Flavio Junqueira

+1On Oct 23, 2010, at 12:47 AM, Henry Robinson wrote:+1On 22 October 2010 14:53, Mahadev Konar maha...@yahoo-inc.com wrote:+1On 10/22/10 2:42 PM, "Patrick Hunt" ph...@apache.org wrote:Please vote as to whether you think ZooKeeper should become atop-level Apache project, as discussed previously on this list. I'veincluded below a draft board resolution.Do folks support sending this request on to the Hadoop PMC?Patrick X. Establish the Apache ZooKeeper Project WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to distributed system coordination for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the "Apache ZooKeeper Project", be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is responsible for the creation and maintenance of software related to distributed system coordination; and be it further RESOLVED, that the office of "Vice President, Apache ZooKeeper" be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache ZooKeeper Project, and to have primaryresponsibility for management of the projects within the scope of responsibility of the Apache ZooKeeper Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache ZooKeeper Project: * Patrick Hunt ph...@apache.org * Flavio Junqueira f...@apache.org * Mahadev Konar maha...@apache.org * Benjamin Reed br...@apache.org * Henry Robinson he...@apache.org NOW, THEREFORE, BE IT FURTHER RESOLVED, that Patrick Hunt be appointed to the office of Vice President, Apache ZooKeeper, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache ZooKeeper PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache ZooKeeper Project; and be it further RESOLVED, that the Apache ZooKeeper Project be and hereby is tasked with the migration and rationalization of the Apache Hadoop ZooKeeper sub-project; and be it further RESOLVED, that all responsibilities pertaining to the Apache Hadoop ZooKeeper sub-project encumbered upon the Apache Hadoop Project are hereafter discharged.-- Henry RobinsonSoftware EngineerCloudera415-994-6679 flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301

Re: Heisenbugs, Bohrbugs, Mandelbugs?

2010-10-23 Thread Flavio Junqueira

Thomas, Could you open jiras and make available the logs for tests that failed for you?Thanks,-FlavioOn Oct 22, 2010, at 7:56 PM, Thomas Koch wrote:Mahadev Konar:Hi Thomas, Could you verify this by just testing the trunk without your patch? Youmight very well be right that those tests are a little flaky.As for the hudson builds, Nigel is working on getting the patch builds forzookeeper running. As soon as that gets fixed this flaky tests would showup more often.ThanksmahadevOn 10/20/10 11:48 PM, "Thomas Koch" tho...@koch.ro wrote:Hi,last night I let my hudson server do 42 (sic) builds of ZooKeeper trunk.One of this builds failed:junit.framework.AssertionFailedError: Leader hasn't joined: 5 at org.apache.zookeeper.test.FLETest.testLE(FLETest.java:312)I did this many builds of trunk, because in my quest to redo the clientnetty integration step by step I made one step which resulted in 2failed builds out of 8. The two failures were both:Hi Mahadev,as I've written, I did 42 builds of trunk over the night from which 2 failed and 8 builds of my patch during work time with 2 failures. I also did another round of builds of my patch during last night and got only 1 failure out of ~40 succesful builds.So I believe that the high failure rate of 2/8 from the initial round of patch builds is because I did this builds over the day while other developers also used other virtual machines on the same host.Have a nice weekend,Thomas Koch, http://www.koch.ro flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301

Re: Restarting discussion on ZooKeeper as a TLP

2010-10-21 Thread Flavio Junqueira

+1 for moving forward, and I was wondering if you have an idea of when you'd have a draft of the proposal. It would be good to iterate over it perhaps.-FlavioOn Oct 20, 2010, at 7:50 PM, Patrick Hunt wrote:It's been a few days, any thoughts? Acceptable? I'd like to keep moving theball forward. Thanks.PatrickOn Sun, Oct 17, 2010 at 8:43 PM, 明珠刘 redis...@gmail.com wrote:+12010/10/14 Patrick Hunt ph...@apache.orgIn March of this year we discussed a request from the Apache Board, andHadoop PMC, that we become a TLP rather than a subproject of Hadoop:Original discussionhttp://markmail.org/thread/42cobkpzlgotcbinI originally voted against this move, my primary concern being that wewerenot "ready" to move to tlp status given our small contributor base andlimited contributor diversity. However I'd now like to revisit thatdiscussion/decision. Since that time the team has been working hard toattract new contributors, and we've seen significant new contributionscomein. There has also been feedback from board/pmc addressing many of theseconcerns (both on the list and in private). I am now less concerned aboutthis issue and don't see it as a blocker for us to move to TLP status.A second concern was that by becoming a TLP the project would lose it'sconnection with Hadoop, a big source of new users for us. I've beenassured(and you can see with the other projects that have moved to tlp status;pig/hive/hbase/etc...) that this connection will be maintained. TheHadoopZooKeeper tab for example will redirect to our new homepage.Other Apache members also pointed out to me that we are essentiallyoperating as a TLP within the Hadoop PMC. Most of the other PMC membershavelittle or no experience with ZooKeeper and this makes it difficult forthemto monitor and advise us. By moving to TLP status we'll be able to governourselves and better set our direction.I believe we are ready to become a TLP. Please respond to this email withyour thoughts and any issues. I will call a vote in a few days, oncediscussion settles.Regards,Patrick flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301

[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-893:
---

Status: Open  (was: Patch Available)

Missing a test.

 ZooKeeper high cpu usage when invalid requests
 --

 Key: ZOOKEEPER-893
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
 Environment: Linux 2.6.16
 4x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
 java version 1.6.0_17
 Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
 Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
Reporter: Thijs Terlouw
Assignee: Thijs Terlouw
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-893.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 When ZooKeeper receives certain illegally formed messages on the internal 
 communication port (:4181 by default), it's possible for ZooKeeper to enter 
 an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, 
 but that patch does not resolve all issues.
 from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 
 the two affected parts:
 ===
 int length = msgLength.getInt();  
   
 if(length = 0) { 
   
 throw new IOException(Invalid packet length: + length); 
   
 } 
 ===
 ===
 while (message.hasRemaining()) {  
   
 temp_numbytes = channel.read(message);
   
 if(temp_numbytes  0) {   
   
 throw new IOException(Channel eof before end);  
   
 } 
   
 numbytes += temp_numbytes;
   
 } 
 ===
 how to replicate this bug:
 perform an nmap portscan against your zookeeper server: nmap -sV -n 
 your.ip.here -p4181
 wait for a while untill you see some messages in the logfile and then you 
 will see 100% cpu usage. It does not recover from this situation. With my 
 patch, it does not occur anymore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-893:
---

Attachment: ZOOKEEPER-893.patch

Adding a test and removing an if statement that became unnecessary with this 
patch from RecvWorker.run(). I'll be adding a patch for the 3.3 branch shortly.

 ZooKeeper high cpu usage when invalid requests
 --

 Key: ZOOKEEPER-893
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
 Environment: Linux 2.6.16
 4x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
 java version 1.6.0_17
 Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
 Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
Reporter: Thijs Terlouw
Assignee: Thijs Terlouw
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-893.patch, ZOOKEEPER-893.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 When ZooKeeper receives certain illegally formed messages on the internal 
 communication port (:4181 by default), it's possible for ZooKeeper to enter 
 an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, 
 but that patch does not resolve all issues.
 from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 
 the two affected parts:
 ===
 int length = msgLength.getInt();  
   
 if(length = 0) { 
   
 throw new IOException(Invalid packet length: + length); 
   
 } 
 ===
 ===
 while (message.hasRemaining()) {  
   
 temp_numbytes = channel.read(message);
   
 if(temp_numbytes  0) {   
   
 throw new IOException(Channel eof before end);  
   
 } 
   
 numbytes += temp_numbytes;
   
 } 
 ===
 how to replicate this bug:
 perform an nmap portscan against your zookeeper server: nmap -sV -n 
 your.ip.here -p4181
 wait for a while untill you see some messages in the logfile and then you 
 will see 100% cpu usage. It does not recover from this situation. With my 
 patch, it does not occur anymore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-893:
---

Attachment: ZOOKEEPER-893-3.3.patch

Thanks, Thijs. Adding 3.3 patch. 

 ZooKeeper high cpu usage when invalid requests
 --

 Key: ZOOKEEPER-893
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
 Environment: Linux 2.6.16
 4x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
 java version 1.6.0_17
 Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
 Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
Reporter: Thijs Terlouw
Assignee: Thijs Terlouw
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, 
 ZOOKEEPER-893.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 When ZooKeeper receives certain illegally formed messages on the internal 
 communication port (:4181 by default), it's possible for ZooKeeper to enter 
 an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, 
 but that patch does not resolve all issues.
 from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 
 the two affected parts:
 ===
 int length = msgLength.getInt();  
   
 if(length = 0) { 
   
 throw new IOException(Invalid packet length: + length); 
   
 } 
 ===
 ===
 while (message.hasRemaining()) {  
   
 temp_numbytes = channel.read(message);
   
 if(temp_numbytes  0) {   
   
 throw new IOException(Channel eof before end);  
   
 } 
   
 numbytes += temp_numbytes;
   
 } 
 ===
 how to replicate this bug:
 perform an nmap portscan against your zookeeper server: nmap -sV -n 
 your.ip.here -p4181
 wait for a while untill you see some messages in the logfile and then you 
 will see 100% cpu usage. It does not recover from this situation. With my 
 patch, it does not occur anymore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-893:
---

Status: Patch Available  (was: Open)

 ZooKeeper high cpu usage when invalid requests
 --

 Key: ZOOKEEPER-893
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
 Environment: Linux 2.6.16
 4x Intel(R) Xeon(R) CPU X3320  @ 2.50GHz
 java version 1.6.0_17
 Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
 Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
Reporter: Thijs Terlouw
Assignee: Thijs Terlouw
Priority: Critical
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, 
 ZOOKEEPER-893.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 When ZooKeeper receives certain illegally formed messages on the internal 
 communication port (:4181 by default), it's possible for ZooKeeper to enter 
 an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, 
 but that patch does not resolve all issues.
 from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java 
 the two affected parts:
 ===
 int length = msgLength.getInt();  
   
 if(length = 0) { 
   
 throw new IOException(Invalid packet length: + length); 
   
 } 
 ===
 ===
 while (message.hasRemaining()) {  
   
 temp_numbytes = channel.read(message);
   
 if(temp_numbytes  0) {   
   
 throw new IOException(Channel eof before end);  
   
 } 
   
 numbytes += temp_numbytes;
   
 } 
 ===
 how to replicate this bug:
 perform an nmap portscan against your zookeeper server: nmap -sV -n 
 your.ip.here -p4181
 wait for a while untill you see some messages in the logfile and then you 
 will see 100% cpu usage. It does not recover from this situation. With my 
 patch, it does not occur anymore

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-901) Redesign of QuorumCnxManager

[
https://issues.apache.org/jira/browse/ZOOKEEPER-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921997#action_12921997
]

Flavio Junqueira commented on ZOOKEEPER-901:

It is a good point, Pat. It crossed my mind, but I thought it would be overkill
to use netty. However, if it is simpler to have it for compatibility and
uniformity purposes, then we should consider it.

Redesign of QuorumCnxManager

Key: ZOOKEEPER-901
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-901
Project: Zookeeper
Issue Type: Improvement
Components: leaderElection
Affects Versions: 3.3.1
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
Fix For: 3.4.0

QuorumCnxManager manages TCP connections between ZooKeeper servers for leader
election in replicated mode. We have identified over time a couple of
deficiencies that we would like to fix. Unfortunately, fixing these issues
requires a little more than just generating a couple of small patches. More
specifically, I propose, based on previous discussions with the community,
that we reimplement QuorumCnxManager so that we achieve the following:
# Establishing connections should not be a blocking operation, and perhaps
even more important, it shouldn't prevent the establishment of connections
with other servers;
# Using a pair of threads per connection is a little messy, and we have seen
issues over time due to the creation and destruction of such threads. A more
reasonable approach is to have a single thread and a selector.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-881) ZooKeeperServer.loadData loads database twice


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-881:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Ben forgot to close this issue.

 ZooKeeperServer.loadData loads database twice
 -

 Key: ZOOKEEPER-881
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-881
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Trivial
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-881.patch


 zkDb.loadDataBase() is called twice at the beginning of loadData().  It 
 shouldn't have any negative affects, but is unnecessary.   A patch should be 
 trivial.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (ZOOKEEPER-881) ZooKeeperServer.loadData loads database twice


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira resolved ZOOKEEPER-881.


Resolution: Fixed

Committed to the 3.3 branch (Committed revision 1023935.)

 ZooKeeperServer.loadData loads database twice
 -

 Key: ZOOKEEPER-881
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-881
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Trivial
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-881.patch


 zkDb.loadDataBase() is called twice at the beginning of loadData().  It 
 shouldn't have any negative affects, but is unnecessary.   A patch should be 
 trivial.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-786) Exception in ZooKeeper.toString


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-786:
---

 Priority: Minor  (was: Major)
Fix Version/s: (was: 3.3.2)

 Exception in ZooKeeper.toString
 ---

 Key: ZOOKEEPER-786
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-786
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.1
 Environment: Mac OS X, x86
Reporter: Stephen Green
Priority: Minor
 Fix For: 3.4.0


 When trying to call ZooKeeper.toString during client disconnections, an 
 exception can be generated:
 [04/06/10 15:39:57.744] ERROR Error while calling watcher 
 java.lang.Error: java.net.SocketException: Socket operation on non-socket
   at sun.nio.ch.Net.localAddress(Net.java:128)
   at sun.nio.ch.SocketChannelImpl.localAddress(SocketChannelImpl.java:430)
   at sun.nio.ch.SocketAdaptor.getLocalAddress(SocketAdaptor.java:147)
   at java.net.Socket.getLocalSocketAddress(Socket.java:717)
   at 
 org.apache.zookeeper.ClientCnxn.getLocalSocketAddress(ClientCnxn.java:227)
   at org.apache.zookeeper.ClientCnxn.toString(ClientCnxn.java:183)
   at java.lang.String.valueOf(String.java:2826)
   at java.lang.StringBuilder.append(StringBuilder.java:115)
   at org.apache.zookeeper.ZooKeeper.toString(ZooKeeper.java:1486)
   at java.util.Formatter$FormatSpecifier.printString(Formatter.java:2794)
   at java.util.Formatter$FormatSpecifier.print(Formatter.java:2677)
   at java.util.Formatter.format(Formatter.java:2433)
   at java.util.Formatter.format(Formatter.java:2367)
   at java.lang.String.format(String.java:2769)
   at com.echonest.cluster.ZooContainer.process(ZooContainer.java:544)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
 Caused by: java.net.SocketException: Socket operation on non-socket
   at sun.nio.ch.Net.localInetAddress(Native Method)
   at sun.nio.ch.Net.localAddress(Net.java:125)
   ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-786) Exception in ZooKeeper.toString


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922216#action_12922216
 ] 

Flavio Junqueira commented on ZOOKEEPER-786:


Since this seems to be a minor issue and to avoid further delays with 3.3.2, I 
propose we move it to 3.4.0.

 Exception in ZooKeeper.toString
 ---

 Key: ZOOKEEPER-786
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-786
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.3.1
 Environment: Mac OS X, x86
Reporter: Stephen Green
 Fix For: 3.4.0


 When trying to call ZooKeeper.toString during client disconnections, an 
 exception can be generated:
 [04/06/10 15:39:57.744] ERROR Error while calling watcher 
 java.lang.Error: java.net.SocketException: Socket operation on non-socket
   at sun.nio.ch.Net.localAddress(Net.java:128)
   at sun.nio.ch.SocketChannelImpl.localAddress(SocketChannelImpl.java:430)
   at sun.nio.ch.SocketAdaptor.getLocalAddress(SocketAdaptor.java:147)
   at java.net.Socket.getLocalSocketAddress(Socket.java:717)
   at 
 org.apache.zookeeper.ClientCnxn.getLocalSocketAddress(ClientCnxn.java:227)
   at org.apache.zookeeper.ClientCnxn.toString(ClientCnxn.java:183)
   at java.lang.String.valueOf(String.java:2826)
   at java.lang.StringBuilder.append(StringBuilder.java:115)
   at org.apache.zookeeper.ZooKeeper.toString(ZooKeeper.java:1486)
   at java.util.Formatter$FormatSpecifier.printString(Formatter.java:2794)
   at java.util.Formatter$FormatSpecifier.print(Formatter.java:2677)
   at java.util.Formatter.format(Formatter.java:2433)
   at java.util.Formatter.format(Formatter.java:2367)
   at java.lang.String.format(String.java:2769)
   at com.echonest.cluster.ZooContainer.process(ZooContainer.java:544)
   at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
 Caused by: java.net.SocketException: Socket operation on non-socket
   at sun.nio.ch.Net.localInetAddress(Native Method)
   at sun.nio.ch.Net.localAddress(Net.java:125)
   ... 15 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-855) clientPortBindAddress should be clientPortAddress


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922238#action_12922238
 ] 

Flavio Junqueira commented on ZOOKEEPER-855:


+1, I'll commit this in a minute.

 clientPortBindAddress should be clientPortAddress
 -

 Key: ZOOKEEPER-855
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-855
 Project: Zookeeper
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.3.0, 3.3.1
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Trivial
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-855.patch


 The server documentation states that the configuration parameter for binding 
 to a specific ip address is clientPortBindAddress.  The code believes the 
 parameter is clientPortAddress.  The documentation for 3.3.X versions needs 
 changed to reflect the correct parameter .  This parameter was added in 
 ZOOKEEPER-635.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-855) clientPortBindAddress should be clientPortAddress


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-855:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks, Jared, I have just committed this:

Branch 3.3: Committed revision 1024022.
Trunk: Committed revision 1024029.

 clientPortBindAddress should be clientPortAddress
 -

 Key: ZOOKEEPER-855
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-855
 Project: Zookeeper
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.3.0, 3.3.1
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Trivial
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-855.patch, ZOOKEEPER-855.patch


 The server documentation states that the configuration parameter for binding 
 to a specific ip address is clientPortBindAddress.  The code believes the 
 parameter is clientPortAddress.  The documentation for 3.3.X versions needs 
 changed to reflect the correct parameter .  This parameter was added in 
 ZOOKEEPER-635.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-855) clientPortBindAddress should be clientPortAddress


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-855:
---

Attachment: ZOOKEEPER-855.patch

I'm uploading the patch I committed. The original patch was modifying the html 
instead of the xml source.

 clientPortBindAddress should be clientPortAddress
 -

 Key: ZOOKEEPER-855
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-855
 Project: Zookeeper
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.3.0, 3.3.1
Reporter: Jared Cantwell
Assignee: Jared Cantwell
Priority: Trivial
 Fix For: 3.3.2, 3.4.0

 Attachments: ZOOKEEPER-855.patch, ZOOKEEPER-855.patch


 The server documentation states that the configuration parameter for binding 
 to a specific ip address is clientPortBindAddress.  The code believes the 
 parameter is clientPortAddress.  The documentation for 3.3.X versions needs 
 changed to reflect the correct parameter .  This parameter was added in 
 ZOOKEEPER-635.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (ZOOKEEPER-901) Redesign of QuorumCnxManager

2010-10-17 Thread Flavio Junqueira (JIRA)

Redesign of QuorumCnxManager


 Key: ZOOKEEPER-901
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-901
 Project: Zookeeper
  Issue Type: Improvement
  Components: leaderElection
Affects Versions: 3.3.1
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Fix For: 3.4.0


QuorumCnxManager manages TCP connections between ZooKeeper servers for leader 
election in replicated mode. We have identified over time a couple of 
deficiencies that we would like to fix. Unfortunately, fixing these issues 
requires a little more than just generating a couple of small patches. More 
specifically, I propose, based on previous discussions with the community, that 
we reimplement QuorumCnxManager so that we achieve the following:

# Establishing connections should not be a blocking operation, and perhaps even 
more important, it shouldn't prevent the establishment of connections with 
other servers;
# Using a pair of threads per connection is a little messy, and we have seen 
issues over time due to the creation and destruction of such threads. A more 
reasonable approach is to have a single thread and a selector.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-15 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921467#action_12921467
]

Flavio Junqueira commented on ZOOKEEPER-885:

I'm not sure it is that simple, Dave. The problem is that pings do not require
writes to disk, and in the scenario that Alexandre describes, there are only
pings being processed. Why is the background I/O load affecting the processing
of ZooKeeper? And in particular, why are session expiring as a consequence of
this background I/O load?

Zookeeper drops connections under moderate IO load
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-15 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921557#action_12921557
]

Flavio Junqueira commented on ZOOKEEPER-885:

I've been running it and there is no traffic to the disk while the clients are
watching. We generate a snapshot every snapCount, but given that there are no
transactions generated, no transaction is appended to the log and no new
snapshot is written.

Zookeeper drops connections under moderate IO load
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Restarting discussion on ZooKeeper as a TLP

2010-10-14 Thread Flavio Junqueira

+1. Frankly, I don't see concretes benefits for the community with ZooKeeper becoming a TLP, but perhaps it will become clear over time. Now it is certainly cool to have our own top-level domain: http://zookeeper.apache.org/ rocks!-FlavioOn Oct 14, 2010, at 1:00 PM, Benjamin Reed wrote: +1benOn 10/14/2010 11:47 AM, Henry Robinson wrote:+1,I agree that we've addressed most outstanding concerns, we're ready forTLP.HenryOn 14 October 2010 13:29, Mahadev Konarmaha...@yahoo-inc.com wrote:+1 for moving to TLP.Thanks for starting the vote Pat.mahadevOn 10/13/10 2:10 PM, "Patrick Hunt"ph...@apache.org wrote:In March of this year we discussed a request from the Apache Board, andHadoop PMC, that we become a TLP rather than a subproject of Hadoop:Original discussionhttp://markmail.org/thread/42cobkpzlgotcbinI originally voted against this move, my primary concern being that wewerenot "ready" to move to tlp status given our small contributor base andlimited contributor diversity. However I'd now like to revisit thatdiscussion/decision. Since that time the team has been working hard toattract new contributors, and we've seen significant new contributionscomein. There has also been feedback from board/pmc addressing many of theseconcerns (both on the list and in private). I am now less concerned aboutthis issue and don't see it as a blocker for us to move to TLP status.A second concern was that by becoming a TLP the project would lose it'sconnection with Hadoop, a big source of new users for us. I've beenassured(and you can see with the other projects that have moved to tlp status;pig/hive/hbase/etc...) that this connection will be maintained. TheHadoopZooKeeper tab for example will redirect to our new homepage.Other Apache members also pointed out to me that we are essentiallyoperating as a TLP within the Hadoop PMC. Most of the other PMC membershavelittle or no experience with ZooKeeper and this makes it difficult forthemto monitor and advise us. By moving to TLP status we'll be able to governourselves and better set our direction.I believe we are ready to become a TLP. Please respond to this email withyour thoughts and any issues. I will call a vote in a few days, oncediscussion settles.Regards,Patrick flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301

[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-14 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921218#action_12921218
]

Flavio Junqueira commented on ZOOKEEPER-885:

Hi Alexandre, When you load the machines running the zookeeper servers by
running the dd command, how much time elapses between running dd and observing
the connections expiring? I'm not being able to reproduce it, and I wonder how
long the problem takes to manifest.

Zookeeper drops connections under moderate IO load
--

Key: ZOOKEEPER-885
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
Project: Zookeeper
Issue Type: Bug
Components: server
Affects Versions: 3.2.2, 3.3.1
Environment: Debian (Lenny)
1Gb RAM
swap disabled
100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
Attachments: tracezklogs.tar.gz, tracezklogs.tar.gz,
WatcherTest.java, zklogs.tar.gz

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-13 Thread Flavio Junqueira (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12920713#action_12920713
]

Flavio Junqueira commented on ZOOKEEPER-885:

I remember a while back fixing an issue with CommitProcessor, which was being
killed by a runtime exception. As Pat pointed out, it does look like the
pipeline is stalling, but it is still unclear why and I couldn't find anything
that can indicate the cause.

Let me try to reproduce it.

Zookeeper drops connections under moderate IO load
--

Key: ZOOKEEPER-885
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
Project: Zookeeper
Issue Type: Bug
Components: server
Affects Versions: 3.2.2, 3.3.1
Environment: Debian (Lenny)
1Gb RAM
swap disabled
100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
Attachments: tracezklogs.tar.gz, tracezklogs.tar.gz,
WatcherTest.java, zklogs.tar.gz

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-884) Remove LedgerSequence references from BookKeeper documentation and comments in tests

2010-10-05 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-884:
---

Attachment: ZOOKEEPER-884.patch

This is a very simple patch, and it fixes mostly documentation and comments. 
Given the pace that patches are making progress in ZooKeeper these days, I'll 
+1 it myself (at the risk of not having any value :-) ).

 Remove LedgerSequence references from BookKeeper documentation and comments 
 in tests 
 -

 Key: ZOOKEEPER-884
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-884
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Affects Versions: 3.3.1
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Attachments: ZOOKEEPER-884.patch


 We no longer use LedgerSequence, so we need to remove references in 
 documentation and comments sprinkled throughout the code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-884) Remove LedgerSequence references from BookKeeper documentation and comments in tests

2010-10-05 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-884:
---

Status: Patch Available  (was: Open)

 Remove LedgerSequence references from BookKeeper documentation and comments 
 in tests 
 -

 Key: ZOOKEEPER-884
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-884
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Affects Versions: 3.3.1
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira
 Attachments: ZOOKEEPER-884.patch


 We no longer use LedgerSequence, so we need to remove references in 
 documentation and comments sprinkled throughout the code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (ZOOKEEPER-884) Remove LedgerSequence references from BookKeeper documentation and comments in tests

2010-10-01 Thread Flavio Junqueira (JIRA)

Remove LedgerSequence references from BookKeeper documentation and comments in 
tests 
-

 Key: ZOOKEEPER-884
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-884
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Affects Versions: 3.3.1
Reporter: Flavio Junqueira


We no longer use LedgerSequence, so we need to remove references in 
documentation and comments sprinkled throughout the code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (ZOOKEEPER-884) Remove LedgerSequence references from BookKeeper documentation and comments in tests

2010-10-01 Thread Flavio Junqueira (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira reassigned ZOOKEEPER-884:
--

Assignee: Flavio Junqueira

 Remove LedgerSequence references from BookKeeper documentation and comments 
 in tests 
 -

 Key: ZOOKEEPER-884
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-884
 Project: Zookeeper
  Issue Type: Bug
  Components: contrib-bookkeeper
Affects Versions: 3.3.1
Reporter: Flavio Junqueira
Assignee: Flavio Junqueira

 We no longer use LedgerSequence, so we need to remove references in 
 documentation and comments sprinkled throughout the code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-882) Startup loads last transaction from snapshot

2010-09-30 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916410#action_12916410
 ] 

Flavio Junqueira commented on ZOOKEEPER-882:


(I meant to post a comment yesterday, but jira decided to re-index right at the 
time)

I like the way you structured the restore loop, it is simpler and easier to 
read, and I can't find any problem with it. About the severity of the bug, my 
interpretation is that it is harmless to re-execute the transaction, but still 
worth proposing a patch.

 Startup loads last transaction from snapshot
 

 Key: ZOOKEEPER-882
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-882
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Jared Cantwell
Priority: Minor
 Attachments: 882.diff, restore


 On startup, the server first loads the latest snapshot, and then loads from 
 the log starting at the last transaction in the snapshot.  It should begin 
 from one past that last transaction in the log.  I will attach a possible 
 patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-883) Idle cluster increasingly consumes CPU resources

2010-09-30 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916446#action_12916446
 ] 

Flavio Junqueira commented on ZOOKEEPER-883:


I think this issue is related to ZOOKEEPER-880. It seems that the connections 
nagios creates start a RecvWorker and a SendWorker, and once they close, they 
kill RecvWorker but not SendWorker, so for every notification sent there is an 
orphan RecvWorker.

I see two options:

# Patch it so that it also kills the SendWorker instance;
# Decline connection requests from unknown servers.

I'm also curious to understand why you guys are monitoring the election port.

 Idle cluster increasingly consumes CPU resources
 

 Key: ZOOKEEPER-883
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-883
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
Reporter: Lars George
 Attachments: Archive.zip


 Monitoring the ZooKeeper nodes by polling the various ports using Nagios' 
 open port checks seems to cause a substantial raise of CPU being used by the 
 ZooKeeper daemons. Over the course of a week an idle cluster grew from a 
 baseline 2% to 10% CPU usage. Attached is a stack dump and logs showing the 
 occupied threads. At the end the daemon starts failing on too many open 
 files errors as all handles are used up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-883) Idle cluster increasingly consumes CPU resources

2010-09-30 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916476#action_12916476
 ] 

Flavio Junqueira commented on ZOOKEEPER-883:


I meant to say that there is an orphan SendWorker, not an orphan RecvWorker.

 Idle cluster increasingly consumes CPU resources
 

 Key: ZOOKEEPER-883
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-883
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.3.1
Reporter: Lars George
 Attachments: Archive.zip


 Monitoring the ZooKeeper nodes by polling the various ports using Nagios' 
 open port checks seems to cause a substantial raise of CPU being used by the 
 ZooKeeper daemons. Over the course of a week an idle cluster grew from a 
 baseline 2% to 10% CPU usage. Attached is a stack dump and logs showing the 
 occupied threads. At the end the daemon starts failing on too many open 
 files errors as all handles are used up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-882) Startup loads last transaction from snapshot

2010-09-29 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916065#action_12916065
 ] 

Flavio Junqueira commented on ZOOKEEPER-882:


Hi Jared, Thanks for bringing this up. It doesn't look like that extra call to 
next() is necessary. If there is another file to process, then the call to next 
will return true and we will keep processing transactions, no? 

 Startup loads last transaction from snapshot
 

 Key: ZOOKEEPER-882
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-882
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Jared Cantwell
Priority: Minor
 Attachments: 882.diff


 On startup, the server first loads the latest snapshot, and then loads from 
 the log starting at the last transaction in the snapshot.  It should begin 
 from one past that last transaction in the log.  I will attach a possible 
 patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-882) Startup loads last transaction from snapshot

2010-09-29 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916070#action_12916070
 ] 

Flavio Junqueira commented on ZOOKEEPER-882:


I'm also not clear on your second point. If you check FileTxnIterator.init(), 
then it seems to me that the zxid passed as a parameter should be included, so 
not dt.lastProcessedZxid+1. What am I missing?

 Startup loads last transaction from snapshot
 

 Key: ZOOKEEPER-882
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-882
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Jared Cantwell
Priority: Minor
 Attachments: 882.diff


 On startup, the server first loads the latest snapshot, and then loads from 
 the log starting at the last transaction in the snapshot.  It should begin 
 from one past that last transaction in the log.  I will attach a possible 
 patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-882) Startup loads last transaction from snapshot

2010-09-29 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916145#action_12916145
 ] 

Flavio Junqueira commented on ZOOKEEPER-882:


I agree with your description of the behavior of next, and sounds right to me 
that we should be setting hdr and calling return next(); at the end of the 
catch block.

Regarding init(), we first use the value of zxid to determine which log files 
to read: all log files tagged with a value higher than zxid and the last log 
file that is less than zxid. Next we iterate over the log files until 
hdr.getZxid() is greater or equal to zxid (should be zxid really). This 
guarantees that the next call to next(), after init() returns, will return 
zxid+1. Does it sound right to you?

 Startup loads last transaction from snapshot
 

 Key: ZOOKEEPER-882
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-882
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Reporter: Jared Cantwell
Priority: Minor
 Attachments: 882.diff


 On startup, the server first loads the latest snapshot, and then loads from 
 the log starting at the last transaction in the snapshot.  It should begin 
 from one past that last transaction in the log.  I will attach a possible 
 patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915825#action_12915825
 ] 

Flavio Junqueira commented on ZOOKEEPER-702:


+1, I'm pretty happy with the patch.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

[
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Flavio Junqueira updated ZOOKEEPER-822:
---

Status: Open (was: Patch Available)

Leader election taking a long time to complete
---

Key: ZOOKEEPER-822
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
Project: Zookeeper
Issue Type: Bug
Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
Fix For: 3.3.2, 3.4.0

Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log,
test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz,
ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch,
ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch,
ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch,
ZOOKEEPER-822.patch_v1

Created a 3 node cluster.
1 Fail the ZK leader
2. Let leader election finish. Restart the leader and let it join the
3. Repeat
After a few rounds leader election takes anywhere 25- 60 seconds to finish.
Note- we didn't have any ZK clients and no new znodes were created.
zoo.cfg is shown below:
#Mon Jul 19 12:15:10 UTC 2010
server.1=192.168.4.12\:2888\:3888
server.0=192.168.4.11\:2888\:3888
clientPort=2181
dataDir=/var/zookeeper
syncLimit=2
server.2=192.168.4.13\:2888\:3888
initLimit=5
tickTime=2000
I have attached logs from two nodes that took a long time to form the cluster
after failing the leader. The leader was down anyways so logs from that node
shouldn't matter.
Look for START HERE. Logs after that point should be of our interest.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-822:
---

Attachment: ZOOKEEPER-822-3.3.2.patch

 Leader election taking a long time  to complete
 ---

 Key: ZOOKEEPER-822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, 
 test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, 
 ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, 
 ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, 
 ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, 
 ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1


 Created a 3 node cluster.
 1 Fail the ZK leader
 2. Let leader election finish. Restart the leader and let it join the 
 3. Repeat 
 After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
 Note- we didn't have any ZK clients and no new znodes were created.
 zoo.cfg is shown below:
 #Mon Jul 19 12:15:10 UTC 2010
 server.1=192.168.4.12\:2888\:3888
 server.0=192.168.4.11\:2888\:3888
 clientPort=2181
 dataDir=/var/zookeeper
 syncLimit=2
 server.2=192.168.4.13\:2888\:3888
 initLimit=5
 tickTime=2000
 I have attached logs from two nodes that took a long time to form the cluster 
 after failing the leader. The leader was down anyways so logs from that node 
 shouldn't matter.
 Look for START HERE. Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-822:
---

Attachment: ZOOKEEPER-822.patch

 Leader election taking a long time  to complete
 ---

 Key: ZOOKEEPER-822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, 
 test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, 
 ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, 
 ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, 
 ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, 
 ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1


 Created a 3 node cluster.
 1 Fail the ZK leader
 2. Let leader election finish. Restart the leader and let it join the 
 3. Repeat 
 After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
 Note- we didn't have any ZK clients and no new znodes were created.
 zoo.cfg is shown below:
 #Mon Jul 19 12:15:10 UTC 2010
 server.1=192.168.4.12\:2888\:3888
 server.0=192.168.4.11\:2888\:3888
 clientPort=2181
 dataDir=/var/zookeeper
 syncLimit=2
 server.2=192.168.4.13\:2888\:3888
 initLimit=5
 tickTime=2000
 I have attached logs from two nodes that took a long time to form the cluster 
 after failing the leader. The leader was down anyways so logs from that node 
 shouldn't matter.
 Look for START HERE. Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

[
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Flavio Junqueira updated ZOOKEEPER-822:
---

Status: Patch Available (was: Open)

Thanks for the comments, Ben. I have modified zookeeperAdmin and added the
zookeeper. prefix to the code.

Regarding your question, initiateConnection is called from two methods:
testInitiateConnection (used only in tests) and connectOne. connectOne is
synchronized. Do you still see an issue?

Leader election taking a long time to complete
---

Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log,
test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz,
ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch,
ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch,
ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch,
ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch_v1

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-702:
---

Status: Open  (was: Patch Available)

I forgot to mention that the patch does not apply cleanly. I had to delete the 
first two lines (generated by eclipse), but once I did it applied cleanly. 
Abmar, could you upload a new patch? My +1 still holds...

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-880) QuorumCnxManager$SendWorker grows without bounds

2010-09-27 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12915625#action_12915625
 ] 

Flavio Junqueira commented on ZOOKEEPER-880:


J-D, Has it happened just once or it is reproducible? Does it also happen with 
3.3?

 QuorumCnxManager$SendWorker grows without bounds
 

 Key: ZOOKEEPER-880
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-880
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.2
Reporter: Jean-Daniel Cryans
 Attachments: hbase-hadoop-zookeeper-sv4borg12.log.gz, 
 hbase-hadoop-zookeeper-sv4borg9.log.gz, jstack


 We're seeing an issue where one server in the ensemble has a steady growing 
 number of QuorumCnxManager$SendWorker threads up to a point where the OS runs 
 out of native threads, and at the same time we see a lot of exceptions in the 
 logs.  This is on 3.2.2 and our config looks like:
 {noformat}
 tickTime=3000
 dataDir=/somewhere_thats_not_tmp
 clientPort=2181
 initLimit=10
 syncLimit=5
 server.0=sv4borg9:2888:3888
 server.1=sv4borg10:2888:3888
 server.2=sv4borg11:2888:3888
 server.3=sv4borg12:2888:3888
 server.4=sv4borg13:2888:3888
 {noformat}
 The issue is on the first server. I'm going to attach threads dumps and logs 
 in moment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

[
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Flavio Junqueira updated ZOOKEEPER-822:
---

Status: Open (was: Patch Available)

Leader election taking a long time to complete
---

Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log,
test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz,
ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch,
ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch,
ZOOKEEPER-822.patch_v1

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

[
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Flavio Junqueira updated ZOOKEEPER-822:
---

Attachment: ZOOKEEPER-822-3.3.2.patch

Leader election taking a long time to complete
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-822:
---

Attachment: ZOOKEEPER-822.patch

 Leader election taking a long time  to complete
 ---

 Key: ZOOKEEPER-822
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-822
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.3.0
Reporter: Vishal K
Assignee: Vishal K
Priority: Blocker
 Fix For: 3.3.2, 3.4.0

 Attachments: 822.tar.gz, rhel.tar.gz, test_zookeeper_1.log, 
 test_zookeeper_2.log, zk_leader_election.tar.gz, zookeeper-3.4.0.tar.gz, 
 ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822-3.3.2.patch, 
 ZOOKEEPER-822-3.3.2.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, 
 ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, ZOOKEEPER-822.patch, 
 ZOOKEEPER-822.patch_v1


 Created a 3 node cluster.
 1 Fail the ZK leader
 2. Let leader election finish. Restart the leader and let it join the 
 3. Repeat 
 After a few rounds leader election takes anywhere 25- 60 seconds to finish. 
 Note- we didn't have any ZK clients and no new znodes were created.
 zoo.cfg is shown below:
 #Mon Jul 19 12:15:10 UTC 2010
 server.1=192.168.4.12\:2888\:3888
 server.0=192.168.4.11\:2888\:3888
 clientPort=2181
 dataDir=/var/zookeeper
 syncLimit=2
 server.2=192.168.4.13\:2888\:3888
 initLimit=5
 tickTime=2000
 I have attached logs from two nodes that took a long time to form the cluster 
 after failing the leader. The leader was down anyways so logs from that node 
 shouldn't matter.
 Look for START HERE. Logs after that point should be of our interest.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-822) Leader election taking a long time to complete

[
https://issues.apache.org/jira/browse/ZOOKEEPER-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Flavio Junqueira updated ZOOKEEPER-822:
---

Status: Patch Available (was: Open)

Thanks for reviewing it, Vishal. I have fixed the LOG.warn you pointed out and
uploaded new patch files.

Leader election taking a long time to complete
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model