[ http://issues.apache.org/jira/browse/DERBY-1219?page=all ]

Bryan Pendleton updated DERBY-1219:
-----------------------------------

    Attachment: no-sessions-for-closed-threads.diff

Hi Deepa,
I think that your scenarios are excellent, and they definitely demonstrate the 
problems in this area. 
I think that the fun thing about a bug like this, is that there are many 
possible scenarios. :)

The one I was concentrating on is a bit different from yours, so let me try to 
diagram it as follows:

1) Some thread is idling, blocked in NetworkServerControlImpl.getNextSession() 
as called by
   DRDAConnThread.run(). I think this is the standard place for a thread to 
block when it is idle.

2) Server restart occurs, and runs straight through to completion. This results 
in calling close()
   on the thread from point (1), and also removing that thread from the 
ThreadList. *But the thread
   does not terminate.*

3) Some time later, some new connections start coming in. The first new 
connection, as you
   point out, will create a new thread to handle the session. The next new 
conection, however, will
   find that a thread already exists, and so it will simply put the session 
onto the RunQueue list.

4) The original thread then wakes up, grabs the session, notices that the 
thread has been
   closed, and exits.

The point I'm trying to make here is that no overlapping of actions is 
required, and the connection 
does not have to arrive during the restart. 

It seems to me that, once a restart happens while 1 or more threads happen to 
be sitting idle,
blocked in their getNextSession() calls, then those threads are "poisoned", and 
there is
now a ticking time bomb in the server. At some point in the future, a session 
will get added
to the RunQueue, and one of these "poisoned" (closed) threads will grab the 
session, and
will then terminate prematurely without processing the session.

So the only place where I differ with your analysis, I believe, is that I think 
it is *not* okay to
leave these threads out there, marked as closed, because at some point in the 
future the
threads will grab sessions off the run queue and fail to process them.

So I think one crucial thing to ensure is that, once a thread is marked as 
closed, it will no 
longer pick up a new session to process.

With that in mind, I've experimented with yet another patch, called 
"no-sessions-for-closed-threads.diff",
which attempts to prevent threads marked as closed from fetching sessions to 
process.

It seems to resolve the hang for me, but I haven't exhaustively tested it. 
Still, I thought it
showed enough promise to attach for you to examine.


> jdbcapi/checkDataSource.java and jdbcapi/checkDataSource30.java hang 
> intermittently with client
> -----------------------------------------------------------------------------------------------
>
>          Key: DERBY-1219
>          URL: http://issues.apache.org/jira/browse/DERBY-1219
>      Project: Derby
>         Type: Test

>   Components: Network Server, Network Client
>     Versions: 10.2.0.0
>  Environment: More often on jdk 1.5 or jdk 1.6 but hangs on jdk 1.4.2 as well
>     Reporter: Kathey Marsden
>     Assignee: Bryan Pendleton
>     Priority: Minor
>  Attachments: client_stack_trace_050306.txt, drda_traces_050206.zip, 
> interrupt.diff, no-sessions-for-closed-threads.diff, 
> server_stack_trace_050306.txt, skipThreads.diff, testfiles_afterhang.zip, 
> traces_on_hang.txt
>
> The tests checkDataSource.java and checkDataSource30.java 
> hang intermittently especially with jdk 1.5.
> Attached is the test run output and traces when the server is started 
> separately.
> 1) Enable checkDataSource30.java by taking it out of 
> functionTests/suites/DerbyNetClient.exclude.
> 2) Run the test with client.
> java -Dij.exceptionTrace=true -Dkeepfiles=true -Dframework=DerbyNetClient 
> org.apache.derbyTesting.functionTests.harness.RunTest 
> jdbcapi/checkDataSource30.java
> Attachements:
>  testfiles_after_hang.zip - Test directory.
>  traces_on_hang.txt  - Server side traces obtained by starting the server 
> separately before running the test.
> I wish I had time to work on this right now as I would really like to see 
> this valuable test in the suite, but hopefully someone else will pick it up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to