I've been trying to find a bug for a number of months with no success.  I figured 
someone on this list might have encountered something similar and would be able to 
help.

First of all, the project I'm working on is WhichBot, a bot for Natural Selection.  
All the code is available at http://whichbot.com via SourceForge, if you're interested.

The problem is that the server hangs (cpu usage goes to 100%, console unresponsive) 
intermittently after about 1 to 10 hours of humans vs. bots gameplay.  This only seems 
to happen on Win32, not on Linux, although admittedly the testing focus has been on 
Win32 systems.  It doesn't appear to matter what version of the HL engine you're 
running.

If you attach a debugger to the hung process, it will be in swds.dll and never seems 
to exit or re-enter the bot code.  The call stack does not include the bot code, so I 
don't even know what the last piece of code executed in the bot was.  I'm assuming 
that the HL engine code is looping infinitely, since the CPU usage is at 100%.

I did find one way of getting this to happen - if you have a divide-by-zero bug and 
pass a NaN to RunPlayerMove, you will see behaviour like this.  I wrapped every HL 
call using an angle to ensure the angle is in the forward arc (-180 > angle > 180) and 
that the angle is finite.  After that, I thought I'd finally fixed it, but no, the 
problem is still there, it must be something else.

So, I guess the starting questions would be:

1) Does anyone else know ways of getting this kind of behaviour out of the HL engine?
2) I'm guessing there's no version of the engine DLLs available that have debugging 
symbols (might at least give me a clue to the problem area)?
3) Failing both of those, does anyone have a good idea on how to approach the problem?

The trouble is that the problem is intermittent, so it's hard to even verify if it is 
still there.  I'm pretty sure it has been happening for a long time now, so rolling 
back the code to a "known good" version would roll it back to a version where I know 
for a fact there are a bunch of bugs that cause other stability issues.  Even then, if 
I rolled it back and fixed all the bugs I could find, I wouldn't be sure that the hang 
bug wasn't there without running a server with an ancient version for a week or two.

I'd be most grateful for any help that people could offer, because this bug has me at 
the end of my tether.

Mike Cooper.

_______________________________________________
To unsubscribe, edit your list preferences, or view the list archives, please visit:
http://list.valvesoftware.com/mailman/listinfo/hlcoders

Reply via email to