[ http://issues.apache.org/jira/browse/DERBY-1219?page=all ]
Bryan Pendleton updated DERBY-1219:
-----------------------------------
Attachment: no-sessions-for-closed-threads.diff
Hi Deepa,
I think that your scenarios are excellent, and they definitely demonstrate the
problems in this area.
I think that the fun thing about a bug like this, is that there are many
possible scenarios. :)
The one I was concentrating on is a bit different from yours, so let me try to
diagram it as follows:
1) Some thread is idling, blocked in NetworkServerControlImpl.getNextSession()
as called by
DRDAConnThread.run(). I think this is the standard place for a thread to
block when it is idle.
2) Server restart occurs, and runs straight through to completion. This results
in calling close()
on the thread from point (1), and also removing that thread from the
ThreadList. *But the thread
does not terminate.*
3) Some time later, some new connections start coming in. The first new
connection, as you
point out, will create a new thread to handle the session. The next new
conection, however, will
find that a thread already exists, and so it will simply put the session
onto the RunQueue list.
4) The original thread then wakes up, grabs the session, notices that the
thread has been
closed, and exits.
The point I'm trying to make here is that no overlapping of actions is
required, and the connection
does not have to arrive during the restart.
It seems to me that, once a restart happens while 1 or more threads happen to
be sitting idle,
blocked in their getNextSession() calls, then those threads are "poisoned", and
there is
now a ticking time bomb in the server. At some point in the future, a session
will get added
to the RunQueue, and one of these "poisoned" (closed) threads will grab the
session, and
will then terminate prematurely without processing the session.
So the only place where I differ with your analysis, I believe, is that I think
it is *not* okay to
leave these threads out there, marked as closed, because at some point in the
future the
threads will grab sessions off the run queue and fail to process them.
So I think one crucial thing to ensure is that, once a thread is marked as
closed, it will no
longer pick up a new session to process.
With that in mind, I've experimented with yet another patch, called
"no-sessions-for-closed-threads.diff",
which attempts to prevent threads marked as closed from fetching sessions to
process.
It seems to resolve the hang for me, but I haven't exhaustively tested it.
Still, I thought it
showed enough promise to attach for you to examine.
> jdbcapi/checkDataSource.java and jdbcapi/checkDataSource30.java hang
> intermittently with client
> -----------------------------------------------------------------------------------------------
>
> Key: DERBY-1219
> URL: http://issues.apache.org/jira/browse/DERBY-1219
> Project: Derby
> Type: Test
> Components: Network Server, Network Client
> Versions: 10.2.0.0
> Environment: More often on jdk 1.5 or jdk 1.6 but hangs on jdk 1.4.2 as well
> Reporter: Kathey Marsden
> Assignee: Bryan Pendleton
> Priority: Minor
> Attachments: client_stack_trace_050306.txt, drda_traces_050206.zip,
> interrupt.diff, no-sessions-for-closed-threads.diff,
> server_stack_trace_050306.txt, skipThreads.diff, testfiles_afterhang.zip,
> traces_on_hang.txt
>
> The tests checkDataSource.java and checkDataSource30.java
> hang intermittently especially with jdk 1.5.
> Attached is the test run output and traces when the server is started
> separately.
> 1) Enable checkDataSource30.java by taking it out of
> functionTests/suites/DerbyNetClient.exclude.
> 2) Run the test with client.
> java -Dij.exceptionTrace=true -Dkeepfiles=true -Dframework=DerbyNetClient
> org.apache.derbyTesting.functionTests.harness.RunTest
> jdbcapi/checkDataSource30.java
> Attachements:
> testfiles_after_hang.zip - Test directory.
> traces_on_hang.txt - Server side traces obtained by starting the server
> separately before running the test.
> I wish I had time to work on this right now as I would really like to see
> this valuable test in the suite, but hopefully someone else will pick it up.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira