Hi All,


I have encountered something that I would describe as weird behaviour, although 
I must admit that I'm not an expert on Firebird and cannot say whether this 
should be expected or not. So... first some basic info:
We have a system which relies on Firebird for persistent storage. The system is 
fairly mature however new functions are added as needed.

The system uses firebird 2.5.4 superserver, ibproviderv2 and MDAC 2.7, however 
when researching the issue I have updated our reference system to firebird 
2.5.6 since source-code and pdb's were available for this version. The issue 
still occurs using 2.5.6. We are using WAIT transactions (e.g. the default 
transaction mode).

A thread is trying to perform a SQL select, a client wants to display detailed 
information for a certain row.
Basicly Select * from TABLE where PKEY = PKEY
The thread acquires lock in our server, then works it's way through GDS32.dll 
(fbclient.dll) and then times out on the call to WinSock2 select (I guess it 
retries if no success, because it never returns).
MON$STATEMENT shows a stalled SQL statement corresponding to the SQL statement 
above.

So it all boils down to is this behavior due to bad transaction management on 
our part or is it an issue with firebird?


Background + additional info:
I have been looking into an issue reported by one customer where the system 
seem to deadlock when working with a specific table. In the configuration used 
the system has four clients operating against a server which in turn utilizes 
Firebird.
Now the clients basically displays the table as a grid. The user can "open" a 
row in order to display more detailed information. When this occurs the server 
changes the state of the row (we have a column in the databese for this) to 
open. The user choose to perform an operation on the row (changing the same 
column) and closes the detailed view again changing status of the row. For each 
status change the clients will reload the row in order display updated 
information.

Now everything seems to be working OK when doing work from only one client. 
When working with two clients at a moderate pace things are also working OK. 
But at a high (to frantic) pace a deadlock will occur within 15 minutes. Which 
might suggest some kind of race condition. The thing is that these status 
changes I mentioned when describing the "work flow" can be done with the 
keyboard using a command sequence similar to [Ctrl+O], [Ctrl+O], [Enter] 
allowing the user to change the status 3 times in less than a second. And when 
I write high pace thats what I mean. So in order to reproduce the issue(at the 
office) I need two users hammering away at their keyboards for 15 minutes. 
Please note that the table in this case contains less than 10000 rows, 
sometimes as few as a 1000 rows. However at the customers site the problem is 
easier to reproduce. There the issue seem to occur when having one user working 
at high pace and one doing the work flow at lets say 15 seconds per row. The 
hardware is the same in these cases(model,manufacturer etc), the users are 
different though so maybe the reason why it's harder to reproduce at the office 
(the network should have the same topology etc.)

So, since this seemed to be a deadlock which I presumed occurred within the 
scope of our server I created memory dump which showed a deadlock where one 
server side thread had acquired a lock in our server and then seemed to be 
stuck on the winsock call 'select' but I could not get at decent stacktrace due 
to missing symbol files. I have since then changed the firebird to version 
2.5.6/downloaded symbolfiles and also inserted some very basic trace-logging 
around the suspected culprit(the select call). I have also set the network 
timeout values in the firebird.conf since code review in fbclient showed these 
might be used as timeout parameter to the winsock2 select call.
 I have also added a some code to make sure a blocking call to select doesn't 
occur (always set the timeout if not set). However the timeout used when 
reproducing the issue comes from the firebird.conf file. After these 
modifications I have deduced the following:


Info from memory dump and my tracelogs etc are available upon request.

Thank you in advance!

/John Karlsson
 

Reply via email to