Status: New
Owner: ----
Labels: Type-Defect Priority-Medium
New issue 74 by [email protected]: Race condition in TCP Network Guard
can leave socket still listening for incoming connection after destroy()
https://code.google.com/p/jdiameter/issues/detail?id=74
What steps will reproduce the problem?
1. Check out source for either 1.6.0.FINAL or 1.7.0-SNAPSHOT
2. Build source with mvn clean install
3. Build testsuite with mvn clean install
What is the expected output? What do you see instead?
All test suite tests to pass. Instead of this I observe occasional non
deterministic failures.
What version of the product are you using? On what operating system?
1.6.0.FINAL or 1.7.0-SNAPSHOT
Please provide any additional information below.
The test suite tests do not always pass successfully every time. I
investigated why this was and in one case have traced it to a bug in the
TCP version of the Network guard
(core/jdiameter/impl/src/main/java/org/jdiameter/server/impl/io/tcp/NetworkGuard.java).
In this class the constructor creates a GuardTask object to open the TCP
server socket to listen for connections from other diameter peers. The
GuardTask implements the Runnable interface and is run in a separate thread
(it can be executed with a configurable delay). The problem is that the
GuardTask cleanTask() member (which is called by the NetworkGuard destroy()
member) does not correctly check that the Thread that opens the server
socket has completed. It simply checks if the server socket variable is not
null therefore there is a race condition between the thread creating the
server socket and the thread destroying the NetworkGuard. This can result
in the socket being opened after the NetworkGuard task destroy() member has
been called.
In the teas suite tests this can manifest itself with a test not destroying
its server socket which in turn means the next test fails because it cannot
it self open the server socket (address already in use exception).
I have attached a modified version of the NetworkGuard class where I have
tried to fix this issue. In this version I maintain a handle to the Guard
Task:
private class GuardTask implements Runnable {
private Thread thread;
private Selector selector;
private ServerSocket serverSocket;
ScheduledFuture<?> handle = null; // HANDLE TO GUARD TASK
private final ScheduledExecutorService binder =
Executors.newSingleThreadScheduledExecutor();
public GuardTask(final InetSocketAddress addr) throws IOException {
if(bindDelay > 0) {
logger.info("Socket binding will be delayed by {}ms...", bindDelay);
}
Runnable task = new Runnable() {
public void run() {
try {
logger.debug("Binding {} after delaying {}ms...", addr,
bindDelay);
final ServerSocketChannel ssc = ServerSocketChannel.open();
ssc.configureBlocking(false);
serverSocket = ssc.socket();
serverSocket.bind(addr);
selector = Selector.open();
ssc.register(selector, SelectionKey.OP_ACCEPT, addr);
logger.info("Open server socket {} ", serverSocket);
}
catch (IOException e)
{
logger.error("Error opening server socket", e);
throw new RuntimeException(e);
}
}
};
// KEEP HANDLE TO GUARD TASK THAT OPENS SERVER SOCKET
//---------------------------------------------------
handle = binder.schedule(task, bindDelay, TimeUnit.MILLISECONDS);
}
Then later in the clean task I use the handle to check the server socket
has been setup:
public void cleanTask() {
try {
if (thread != null) {
thread.join(2000);
if (thread.isAlive()) {
// FIXME: remove ASAP
thread.interrupt();
}
thread = null;
}
}
catch (InterruptedException e) {
logger.debug("Can not stop thread", e);
}
// USE SERVER SOCKET CREATION HANDLE TO MAKE SURE THE
// SOCKET HAS BEEN OPENED. THE SERVER SOCKET MAY BE IN THE
// PROCESS OF BEING CREATED...
if (selector == null)
{
logger.debug("Selector is null... checking that it is not in the
process of being started");
if (handle != null)
{
try
{
handle.get(10, TimeUnit.SECONDS);^M
}
catch (Exception e)
{
logger.error("Exception thrown waiting for GuardTask to
complete", e);
}
}
else
{
logger.debug("Handle is null");^M
}
}
As you can see in this version the code waits for 10 seconds to make sure
the server socket has been opened. This is okay for when the GuardTask has
been start without any delay which is the case for all the tests in the
test suite. I have not considered the case where the server socket is
opened after a configurable delay.
Note that in the attached modified file I also added a number of logger
trace statements whilst I was debugging this issue.
Attachments:
NetworkGuard.java 11.6 KB
--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings
--
---
You received this message because you are subscribed to the Google Groups "mobicents-all-issues-changes" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.