> In any case, I'm not real confident about the fix because we couldn't
> reproduce the problem here. My original question is whether anyone knew
> of a way to control threads so that they ran deterministically so we
> could run tests.

Having several years experience with the server side of multithreading and
experiencing similar problems, let me offer some advice.

1.  Get a multi-cpu box.  If your company doesn't test on a box with
multiple CPUs then it hasn't truly tested it's multithreaded app.  A
mutli-cpu box will turn up threading problems that single-cpu systems will
never find.  (Yes, I've had it happen.  ~ 50 installs on various hardware
without any problems for years, suddenly a dual cpu box that couldn't run
the app.)

2.  Write a client that can hammer your server app from across the network.
Just completely slam it for several days.  The more random the data and the
more diverse the data you send the better.  If you can add a feature to your
server that will save "transactions" (in your case they don't sound like
real transactions) so your test client can "replay" the data that'd be
great.  Just record several weeks worth of data and play it all back at
once.  Yes, that's how I test.

3.  Move your model over to IOCP (yes guys, I'm still riding the IOCP
bandwagon).  Here's the general way I'd run your process (if possible this
way, I don't know the nature of your data).  I'd memory map in a huge file
then issue a whole set of reads to read from your driver and put the data in
consecutive chunks in the file, when the file starts getting full I'd open a
second file and get ready to issue the reads on it, as the first file
finally fills up I'd issue all the reads to fill the second file.  Because
the way IOCP works, the thread that is currently handling the last reads of
the first file and setting up the second file may run out of time slice and
another IOCP thread gets triggered.  In such a case have a crit sec flag
that indicates if the second file is already being created or not.  Because
there isn't going to be a lot of cross-thread locking or communication you
can reduce threading problems from your search for bugs.



BTW, what do you mean by "weak code" that you fixed?  Either it works right
or it doesn't.  If you found room for error and a resultant crash, then this
isn't weak code, it's faulty code.  And unless you can find another such
fault in your code and you can't reproduce the problem by doing the testing
in 1 and 2 above, I'd write it off.

/dev



Reply via email to