An interesting result: the breakpoint was hit (rsCheckData had been corrupted to 0x00000001), but rsThreadData - the value created from the same pointer when the function was called - was NOT corrupted:

rsCheckData = 0x00000001
rsThreadData = 0x02F51F54

On the outside of the function (up the call stack one level to the place where it is called), psThreadData (still held in ESI) is valid (0x02F51F54). Thus, although it was passed in twice with the same value, the first one got corrupted on entry! 

OK, this probably means that some other piece of code is trashing the memory which holds this variable.  This indeed proves that the compiler is right (it just pushes the value two times instead of one time, and the last one gets corrupted.)

Now, what you've got to do is to find out what part of the code is doing this change, and that's the hard part.  How long does it get the app to crash?  Does it crash under the debugger as well?  Can you step through the code in the caller right before the call to WaitForRX which leads to the crash?  We've really got to take a look at the stack memory as well as the registers to debug this thing.

And here's where it gets tricky, because I don't think the damage is being done by this particular thread.  He already told us when he looks at the stack frame of the calling function all the variables look right, they would have to be overwritten first if this was a stack overwrite by this thread.  So my guess is somewhere else he's sharing a certain set of pointers between threads and one of these other threads is stepping on the stack here.

Jason, can you share your app?  zip it up and pass it to us?  I'll be happy to check it out myself.  I'd like to examine the stack manually and see about setting up break points on the stack itself.

This rsCheckData, is it on the stack or is it a register?  If it's on the stack, check it's address each time you find it corrupted, if it's always at the same address put a memory breakpoint for when that memory == 0x00000001 to break.  Your app will run a lot slower, but eventually you should catch the culprit.

Also, on the crash get a dump of the stack, view it in the memory viewer and spend some time looking at it.  Figure out where the stack frame is, copy/paste and print it even.  Then compare to a time when you step into the function when it works correctly.  I'm interested in the current stack frame (when it crashes) and the frame of the calling function.  By frame I mean the entire section of stack being used by the function.  Take the time to section out your stack to figure out which variables are in which memory bytes, then comparing between the good copy and the bad copy of both functions (current and previous) figure out what is different.  If this really is a stack over-write you'll find that there is a section of bytes that look normal, then some of this junk.  The variable that is in the junk area (or the variable preceding it) is the key variable that is being used somewhere else to overwrite.  If only this one particular DWORD is being set to 0x00000001 then we'll have to keep thinking about it (unless you research the variable preceding it in the stack and find it's the culprit).

If my instructions are foreign to you (I don't know how deep your knowledge goes here), keep asking questions for clarification, I can't think of another way to proceed with this.

 

/dev

_______________________________________________
msvc mailing list
[email protected]
See http://beginthread.com/mailman/listinfo/msvc_beginthread.com for 
subscription changes, and list archive.

Reply via email to