Hi Fletch, I've recently been hearing instability issues specific to the linux SRCDS. Also with the last few months of updates we have been seeing quirky stability issues It might be on our end, but it seems unlikely as nothing major has changed on our end. What we're seeing is:
1) Sometimes out of nowhere everyone will just time out. The server will keep on running and the host will be completely accessible too. I suspect that where we are hosting it did not have a peering issue as we would have lost remote connection to the host, but it's not impossible. However this might be worth keeping an eye out for. 2) During a map change the server will restart. I unfortunately do not have anything actually useful (like a dump or error message), but I am curious if others have been seeing these same things. This can happen with or without MMS/Sourcemod running. Does any of this sound even remotely relevant to the memory scribble you mentioned? -BloodyIron ----- Original Message ----- From: Fletcher Dunn <[email protected]> Date: Monday, July 25, 2011 2:16 pm Subject: [hlds] TF2 crashes To: "'[email protected]'" <[email protected]>, "'[email protected]'" <[email protected]> > Hey guys, > > A status update on the crashes. I have identified what I > think are 3 different problems. > > 1.) There's a bug in the replay system due to a flaw in libcurl > using a signal to handle DNS timeout. You can avoid this > bug by using IP addresses in your replay config, rather than DNS > names. We will have a software workaround in the next > update or so that essentially does this same thing automatically. > > 2.) There's a random memory scribble. It will manifest > itself as "double free" or "memory corruption" crash, depending > on your OS. Some have theorized than this is due to the > Dr. G weapons. We cannot confirm this. > > 3.) There is a hang. From what information I have > gathered, the last thing in the log is something along the lines > of "PreMinidumpCallback: updating dump comment." In other > words, it is hanging while attempting to report the crash. > This is particularly disastrous because it not only will it > interfere with auto-restart scripts (unless you have some sort > of watchdog), but it prevents the crash report from being > generated and submitted, which of course would help us fix it. > > A random memory scribble can cause all sorts of behaviour, so > it's possible that #2 is the real bug, and #3 just a side effect > that sometimes attends the main bug. > > We have not been able to reproduce any of these issues > internally, and we have had several playtests. (We, the > actual developers, not a separate QA department and not a group > of interns, playtest the game every day, on Windows and Linux > servers.) However, our dedicated servers have experienced > the hang. > > It has been very difficult to track down and fix these crashes > because we seem to have several regressed all at once, and at > least one of the problems is interfering with the normal > reporting mechanism. If anyone is able to save a dump file > (they usually go to /tmp/dumps), I would be great if you could > post them in some webspace and post a URL where they may be > downloaded. Or, if your console log shows that it was > uploaded, please post the report ID. The output will look > something like this: > > PreMinidumpCallback: updating dump comment > Uploading dump (in-process) [proxy ''] > /tmp/dumps/crash_20110723191817_1.dmp > success = yes > response: CrashID=bp-445d6055-e9e7-420a-93b8-688a92110723 > > Grabs of GDB stack traces, etc with raw addresses, are not > totally useless, but they are definitely much less useful. > Even with symbols, a stack trace does not have as much data as a > dump has. So if we could get some actual dumps, that would > be really great. > > These crashes continue to be our top priority. > > - Fletch >
_______________________________________________ To unsubscribe, edit your list preferences, or view the list archives, please visit: http://list.valvesoftware.com/mailman/listinfo/hlds

