[ https://issues.apache.org/jira/browse/ZOOKEEPER-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978705#action_12978705 ]
Flavio Junqueira commented on ZOOKEEPER-932: -------------------------------------------- Hi Vishal, I'll have a look, thanks! If you have a chance, perhaps you can make it available through http://reviews.apache.org. It would be nice if we started using that review board tool more often, do you agree? > Move blocking read/write calls to SendWorker and RecvWorker Threads > ------------------------------------------------------------------- > > Key: ZOOKEEPER-932 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-932 > Project: ZooKeeper > Issue Type: Sub-task > Affects Versions: 3.3.2 > Reporter: Vishal K > Assignee: Vishal K > Fix For: 3.3.3 > > Attachments: ZOOKEEPER-932.patch, ZOOKEEPER-932.patch > > > Copying relevant comments: > Vishal K added a comment - 02/Nov/10 02:09 PM > Hi Flavio, > I have a suggestion for changing the blocking IO code in QuorumCnxManager. It > keeps the current code structure and requires a small amount of changes. I am > not sure if these comments should go in ZOOKEEPER-901. ZOOKEEPER-901 is > probably addressing netty as well. Please feel free to close this JIRA if you > intend to make all the changes as a part of ZOOKEEPER-901. > Basically we jusy need to move parts of initiateConnection and > receiveConnection to SenderWorker and ReceiveWorker. > A. Current flow for receiving connection: > 1. accept connection in Listener.run() > 2. receiveConnection() > * Read remote server's ID > * Take action based on my ID and remote server's ID (disconnect and > reconnect if my ID is > remote server's ID). > * kill current set of SenderWorker and ReciveWorker threads > * Start a new pair > B Current flow for initiating connection: > 1. In connectOne(), connect if not already connected. else return. > 2. send my ID to the remote server > 3. if my ID < remote server disconnect and return > 4. if my ID > remote server > * kill current set of SenderWorker and ReceiveWorkter threads for the > remote server > * Start a new pair > Proposed changes: > Move the code that performs any blocking IO in SenderWorker and ReceiveWorker. > A. Proposed flow for receiving connection: > 1. accept connection in Listener.run() > 2. receiveConnection() > * kill current set of SenderWorker and ReciveWorker threads > * Start a new pair > Proposed changed to SenderWorker: > * Read remote server's ID > * Take action based on my ID and remote server's ID (disconnect and > reconnect if my ID is > remote server's ID). > * Proceed to normal operation > B Proposed flow for initiating connection: > 1. in connectOne(), return if already connected > 2. Start a new SenderWorker and ReceiveWorker pair > 2. In SenderWorker > * connect to remote server > * write my ID > * if my ID < remote server disconnect and return (shutdown the pair). > * Proceed to normal operation > Questions: > * In QuorumCnxManager, is it necessary to kill the current pair and > restart a new one every time we receive a connect request? > * In receiveConnection we may choose to reject an accepted connection if > a thread in > SenderWorker is in the process of connecting. Otherwise a server with > ID < > remote server may keep sending frequent connect request that will > result in the > remote server closing connections for this peer. But I think we add a > delay > before sending notifications, which might be good enough to prevent this > problem. > Let me know what you think about this. I can also help with the > implementation. > Flavio Junqueira added a comment - 03/Nov/10 05:28 PM > Hi Vishal, I like your proposal, it seems reasonable and not difficult to > implement. > On your questions: > 1. I don't think it is necessary to kill a pair SenderWorker/RecvWorker > every time, and I'd certainly support changing it; > 2. I'm not sure where you're suggesting to introduce a delay. In the FLE > code, a server sends a new batch of notifications if it changes its vote or > if it times out waiting for a new notification. This timeout value increases > over time. I was actually thinking that we should reset the timeout value > upon receiving a notification. I think this is a bug.... > Given that it is your proposal, I'd be happy to let you take a stab at it and > help you out if you need a hand. Does it make sense for you? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.