Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
On 10 May 2013, at 03:19, Bruce Tulloch wrote: The compiler turns such functions into procedures with an implicit var-parameter and the *caller* passes the location where the function result should go via that parameter. Okay, thanks, that clarifies, now I understand how a variable in the caller's scope can be affected while making assignments to Result in the callee's scope BEFORE callee has finished executing. Another way of stating this is; Result is a local variable of a function, initialized to nil and passed by value to the caller upon completion ONLY if Result not a reference to a dynamic type, otherwise it's an implicit var argument with scope beyond that of the function. Is that correct? Yes, apart from the fact that result is never initialized to nil. If so, it would seem to be a bit of semantic trap for the unwary :-) Differences in the execution because of the above change can only occur in case you have memory corruption. On the other hand, in that case anything is possible regardless of what optimisation have or have not been performed by the compiler. Such optimizations only occur in safe situations (e.g., not when assigning to a global variable... Does the compiler consider ANY non-local variable to be global? For example, fields of an object? These are indeed global. And so are e.g. local variables whose address has been taken, that are used in assembler code, or that have been passed to a var-parameter (because the called routine may then have stored its address). There are no cases that I know of where the compiler can perform that optimisation in an unsafe scenario. Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
On 05/09/2013 05:19 AM, Bruce Tulloch wrote: This tells me that the test at the top of fpc_AnsiStr_Decr_Ref: cmpl $0,(%eax) jne .Ldecr_ref_continue ret .Ldecr_ref_continue: passed (i.e. (%eax) was NOT nil) but sometime during the execution of the following code: // Temps allocated between ebp-24 and ebp+0 subl$4,%esp // Var S located in register // Var l located in register movl%eax,(%esp) // [101] l:=@PAnsiRec mailto:=@PAnsiRec(S-FirstOff)^.Ref; movl(%eax),%edx subl$8,%edx // [102] If l^0 then exit; cmpl$0,(%edx) the variable (%eax) MUST have been changed (to nil) BY ANOTHER THREAD. Is there any other plausible explanation I may have missed? SIGSEGV is caused by an access to any memory outside the process address space. Not only nil. So the first test only checks if the address is not nil but will let other, even invalid, addresses pass on. If there is no other explanation, then it means I need to find out how the string variable referred to by (%eax) could have been been accessed (or even known to exist) by any other thread in the same address space. If that variable is local to a function (i.e. foo's Result with SEGV upon its assignment immediately it first comes into scope, per my earlier email) then absent a bug in FPC's handling string references and allocation, it seems impossible that it could be known or referenced by any other other thread. I'm reasonably confident there's no other way it could be overwritten by another thread (i.e. I don't think there are any range or buffer pointer errors anywhere else) so logic tells me I must have the wrong thesis or there's a string handling error in FPC. Any clues or insight, gratefully received :-) Result in foo is initialized with the address of the left side variable in the call to foo. If you have s:=foo; result will point to s. If you just call foo; and drop the result, the compiler will create and use a hidden temp string variable. Strings are managed types and initialized to nil. So you are looking at the wrong location for your bug. You should look at what has corrupted the string variable that receives the result of foo. Ludo ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
El 09/05/2013 5:19, Bruce Tulloch escribió: If there is no other explanation, then it means I need to find out how the string variable referred to by (%eax) could have been been accessed (or even known to exist) by any other thread in the same address space.-- Hello, In the past I had suffered a problem like yours and the culprit was another different function that passes result (string) as a parameter when calling a function without initialization, something like this: function foo(var para: string): string; begin //Something with para end; function bar(): string; begin result:=foo(result); end; I hope this helps... ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
Thanks Ludo, but I know the value in (%eax) in this case is nil (see the cpu register dump in my email) because the address of the string length (in edx) is 0xfff8 (which is 8 less than nil) per the instruction just before the one that fails with SEGV. The SEGV itself is caused by an attempt to read the address in edx, i.e. 0xfff8 at the instruction cmpl $0,(%edx). The corruption is not occurring when the return value of foo is used, it's occurring when the Result variable in foo is first assigned (a valid string, '') when Result first appears in scope of the body of the function foo. Thanks for your feedback. Cheers, Bruce. On Thu, May 9, 2013 at 4:21 PM, Ludo Brands ludo.bra...@free.fr wrote: On 05/09/2013 05:19 AM, Bruce Tulloch wrote: This tells me that the test at the top of fpc_AnsiStr_Decr_Ref: cmpl $0,(%eax) jne .Ldecr_ref_continue ret .Ldecr_ref_continue: passed (i.e. (%eax) was NOT nil) but sometime during the execution of the following code: // Temps allocated between ebp-24 and ebp+0 subl$4,%esp // Var S located in register // Var l located in register movl%eax,(%esp) // [101] l:=@PAnsiRec mailto:=@PAnsiRec(S-FirstOff)^.Ref; movl(%eax),%edx subl$8,%edx // [102] If l^0 then exit; cmpl$0,(%edx) the variable (%eax) MUST have been changed (to nil) BY ANOTHER THREAD. Is there any other plausible explanation I may have missed? SIGSEGV is caused by an access to any memory outside the process address space. Not only nil. So the first test only checks if the address is not nil but will let other, even invalid, addresses pass on. If there is no other explanation, then it means I need to find out how the string variable referred to by (%eax) could have been been accessed (or even known to exist) by any other thread in the same address space. If that variable is local to a function (i.e. foo's Result with SEGV upon its assignment immediately it first comes into scope, per my earlier email) then absent a bug in FPC's handling string references and allocation, it seems impossible that it could be known or referenced by any other other thread. I'm reasonably confident there's no other way it could be overwritten by another thread (i.e. I don't think there are any range or buffer pointer errors anywhere else) so logic tells me I must have the wrong thesis or there's a string handling error in FPC. Any clues or insight, gratefully received :-) Result in foo is initialized with the address of the left side variable in the call to foo. If you have s:=foo; result will point to s. If you just call foo; and drop the result, the compiler will create and use a hidden temp string variable. Strings are managed types and initialized to nil. So you are looking at the wrong location for your bug. You should look at what has corrupted the string variable that receives the result of foo. Ludo ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
Thanks José, I can see that might cause a problem given bar passes result by reference to foo without initializing result first. My question to Jonas or others more knowledgeable than me about what the compiler does, is whether result (in your example and my own case) is guaranteed to be initialized to nil when it first appears in scope (i.e. before it's been assigned any value in our code). If it is initialized to nil, then foo would receive a reference to bar's result variable (via para) and the value of that variable would be nil (and all would be okay). If it isn't initialized to nil, the same rule applies but the value of result (as seen by foo via para) would likely be invalid and would probably blow up in foo when dereferenced (as a string). My problem is similar except that I know it's not nil when passed in (because the initial test in fpc_AnsiStr_Decr_Ref looking for nil passes) but that it becomes nil very soon afterward (because the SEGV arises as an indirect result of it being nil, as I explained in my reply to Ludo just now). I'm pretty sure I have a shared memory problem somewhere between threads in my code but I can't understand how this could be given the erroneously shared variable appears to be an automatic variable (i.e. Result) that has just been created on the stack in the function foo that calls fpc_AnsiStr_Decr_Ref where the SEGV occurs. I'll keep looking :-) Bruce. On Thu, May 9, 2013 at 9:48 PM, José Mejuto joshy...@gmail.com wrote: El 09/05/2013 5:19, Bruce Tulloch escribió: If there is no other explanation, then it means I need to find out how the string variable referred to by (%eax) could have been been accessed (or even known to exist) by any other thread in the same address space.-- Hello, In the past I had suffered a problem like yours and the culprit was another different function that passes result (string) as a parameter when calling a function without initialization, something like this: function foo(var para: string): string; begin //Something with para end; function bar(): string; begin result:=foo(result); end; I hope this helps... __**_ fpc-pascal maillist - fpc-pascal@lists.freepascal.**orgfpc-pascal@lists.freepascal.org http://lists.freepascal.org/**mailman/listinfo/fpc-pascalhttp://lists.freepascal.org/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
On 09 May 2013, at 14:39, Bruce Tulloch wrote: Thanks José, I can see that might cause a problem given bar passes result by reference to foo without initializing result first. My question to Jonas or others more knowledgeable than me about what the compiler does, is whether result (in your example and my own case) is guaranteed to be initialized to nil when it first appears in scope (i.e. before it's been assigned any value in our code). Every instance of an automated type, whether it was explicitly declared or implicitly created as a temp, initially gets the value nil. However, as Michael and Ludo explained, the result variable of a function returning an ansistring/unicodestring is not created inside that function itself. The compiler turns such functions into procedures with an implicit var-parameter and the *caller* passes the location where the function result should go via that parameter. This location can be a temporary location, but the compiler can also optimize this by directly passing the location of the variable to which you assign the result of that function call. Such optimizations only occur in safe situations (e.g., not when assigning to a global variable, because otherwise assigning something to the function result would immediately change the value of that global variable too), but as Ludo explains this means that you are looking in the wrong place for the data race. So you are probably writing in two threads to whatever you are assigning the result of that function to. Jonas___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
On 09.05.2013 14:39, Bruce Tulloch wrote: Thanks José, I can see that might cause a problem given bar passes result by reference to foo without initializing result first. My question to Jonas or others more knowledgeable than me about what the compiler does, is whether result (in your example and my own case) is guaranteed to be initialized to nil when it first appears in scope (i.e. before it's been assigned any value in our code). If it is initialized to nil, then foo would receive a reference to bar's result variable (via para) and the value of that variable would be nil (and all would be okay). If it isn't initialized to nil, the same rule applies but the value of result (as seen by foo via para) would likely be invalid and would probably blow up in foo when dereferenced (as a string). My problem is similar except that I know it's not nil when passed in (because the initial test in fpc_AnsiStr_Decr_Ref looking for nil passes) but that it becomes nil very soon afterward (because the SEGV arises as an indirect result of it being nil, as I explained in my reply to Ludo just now). I'm pretty sure I have a shared memory problem somewhere between threads in my code but I can't understand how this could be given the erroneously shared variable appears to be an automatic variable (i.e. Result) that has just been created on the stack in the function foo that calls fpc_AnsiStr_Decr_Ref where the SEGV occurs. I'll keep looking :-) Bruce. Do you play around with pointers anywhere? I once had it that I overwrote something in a parent stackframe, so maybe you could by accident access the memory location of the Result variable... Regards, Sven ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
The compiler turns such functions into procedures with an implicit var-parameter and the *caller* passes the location where the function result should go via that parameter. Okay, thanks, that clarifies, now I understand how a variable in the caller's scope can be affected while making assignments to Result in the callee's scope BEFORE callee has finished executing. Another way of stating this is; Result is a local variable of a function, initialized to nil and passed by value to the caller upon completion ONLY if Result not a reference to a dynamic type, otherwise it's an implicit var argument with scope beyond that of the function. Is that correct? If so, it would seem to be a bit of semantic trap for the unwary :-) Such optimizations only occur in safe situations (e.g., not when assigning to a global variable... Does the compiler consider ANY non-local variable to be global? For example, fields of an object? So you are probably writing in two threads to whatever you are assigning the result of that function to. Yep, makes sense, we will look carefully to see if that's what we're doing. The functions concerned are actually methods of the TBlockSocket class of the synapse library. We use an instance of this class in two threads; one sending, the other receiving. These threads have full shared memory protection in our own code but having a look at the TBlockSocket implementation I can see at least one suspect; FLastErrorDesc. This field is changed by methods that send and receive on the socket which means it's assigned values in the context of two different threads (given our usage). Indeed it suggests TBlockSocket is not thread safe as currently coded. Looks like a smoking gun to me. Thanks one and all for all your helpful feedback! Bruce. On Thu, May 9, 2013 at 10:55 PM, Jonas Maebe jonas.ma...@elis.ugent.bewrote: On 09 May 2013, at 14:39, Bruce Tulloch wrote: Thanks José, I can see that might cause a problem given bar passes result by reference to foo without initializing result first. My question to Jonas or others more knowledgeable than me about what the compiler does, is whether result (in your example and my own case) is guaranteed to be initialized to nil when it first appears in scope (i.e. before it's been assigned any value in our code). Every instance of an automated type, whether it was explicitly declared or implicitly created as a temp, initially gets the value nil. However, as Michael and Ludo explained, the result variable of a function returning an ansistring/unicodestring is not created inside that function itself. The compiler turns such functions into procedures with an implicit var-parameter and the *caller* passes the location where the function result should go via that parameter. This location can be a temporary location, but the compiler can also optimize this by directly passing the location of the variable to which you assign the result of that function call. Such optimizations only occur in safe situations (e.g., not when assigning to a global variable, because otherwise assigning something to the function result would immediately change the value of that global variable too), but as Ludo explains this means that you are looking in the wrong place for the data race. So you are probably writing in two threads to whatever you are assigning the result of that function to. Jonas___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
[fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
After a random but very long period of time (i.e. very many successful calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref. GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's reference is to be decremented) is nil (i.e. 0x0). Prima facie, that's the reason for the SEGV, but how is it possible that the compiler would pass a nil pointer to this function the first place? To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system executing in a multi-threaded application (which uses python threads and fpc threads). I have not found obvious evidence of memory corruption from other execution contexts or shared memory handling problems. The SEGV occurs when called from a function, let's call it foo, that looks like this: function foo : AnsiString; begin Result := ''; other stuff end; The AnsiString pointer that fpc_AnsiStr_Decr_Ref throws a SEGV is Result, at the first line of the function foo. It appears the compiler is passing Result to fpc_AnsiStr_Decr_Ref even though Result (at this point in the function) must be nil (having only just come into scope). How is is possible that fpc_AnsiStr_Decr_Ref is being called at all? Any/all advice gratefully received. Cheers, Bruce. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
On Wed, 8 May 2013, Bruce Tulloch wrote: After a random but very long period of time (i.e. very many successful calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref. GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's reference is to be decremented) is nil (i.e. 0x0). Prima facie, that's the reason for the SEGV, but how is it possible that the compiler would pass a nil pointer to this function the first place? To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system executing in a multi-threaded application (which uses python threads and fpc threads). I have not found obvious evidence of memory corruption from other execution contexts or shared memory handling problems. The SEGV occurs when called from a function, let's call it foo, that looks like this: function foo : AnsiString; begin Result := ''; other stuff end; The AnsiString pointer that fpc_AnsiStr_Decr_Ref throws a SEGV is Result, at the first line of the function foo. It appears the compiler is passing Result to fpc_AnsiStr_Decr_Ref even though Result (at this point in the function) must be nil (having only just come into scope). This is not correct. Result is NOT guaranteed to be nil. About a year ago, I was as surprised as you are to discover this, but it is so. It is even so in Delphi. How is is possible that fpc_AnsiStr_Decr_Ref is being called at all? Roughly: What happens is that the caller gives the address of the location where the result must go. The function receives this address, and then treats it as a normal variable, meaning that as soon as it is used, fpc_AnsiStr_Decr_Ref and friends come into play. The exact behaviour also depends on the compiler version. One of the compiler maintainers can describe this in more detail. Michael.___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
On 08 May 2013, at 08:13, Bruce Tulloch wrote: After a random but very long period of time (i.e. very many successful calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref. GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's reference is to be decremented) is nil (i.e. 0x0). Prima facie, that's the reason for the SEGV, but how is it possible that the compiler would pass a nil pointer to this function the first place? The first thing fpc_AnsiStr_Decr_Ref does is check whether its parameter is nil, and if so it immediately exists. It can be nil in case the ansistring contains an empty string. That routine itself also sets its argument to nil in case this was not the case initially (it's a var-parameter), and I assume your crash happens after this has been done. To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system executing in a multi-threaded application (which uses python threads and fpc threads). I have not found obvious evidence of memory corruption from other execution contexts or shared memory handling problems. It's nevertheless most likely memory corruption. You can try compiling with -gv and running your program under valgrind to see whether it finds anything (you will probably get some false positives about certain RTL pchar routines such as strscan and strlen, but you can ignore those). Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
Thanks Jonas, that confirms what I suspected. Next time I trap an instance of this (rare) fault I will inspect exactly which CPU instruction raised the SEGV inside fpc_AnsiStr_Decr_Ref in search of a source of memory corruption. Bruce. On Wed, May 8, 2013 at 11:49 PM, Jonas Maebe jonas.ma...@elis.ugent.bewrote: On 08 May 2013, at 08:13, Bruce Tulloch wrote: After a random but very long period of time (i.e. very many successful calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref. GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's reference is to be decremented) is nil (i.e. 0x0). Prima facie, that's the reason for the SEGV, but how is it possible that the compiler would pass a nil pointer to this function the first place? The first thing fpc_AnsiStr_Decr_Ref does is check whether its parameter is nil, and if so it immediately exists. It can be nil in case the ansistring contains an empty string. That routine itself also sets its argument to nil in case this was not the case initially (it's a var-parameter), and I assume your crash happens after this has been done. To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system executing in a multi-threaded application (which uses python threads and fpc threads). I have not found obvious evidence of memory corruption from other execution contexts or shared memory handling problems. It's nevertheless most likely memory corruption. You can try compiling with -gv and running your program under valgrind to see whether it finds anything (you will probably get some false positives about certain RTL pchar routines such as strscan and strlen, but you can ignore those). Jonas __**_ fpc-pascal maillist - fpc-pascal@lists.freepascal.**orgfpc-pascal@lists.freepascal.org http://lists.freepascal.org/**mailman/listinfo/fpc-pascalhttp://lists.freepascal.org/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
Michael, thanks for your feedback. One thing that confuses me in light of Jonas' reply, if what you say is correct (that local variables that have just come into scope are not guaranteed to be nil) then assignment of Result := ''; at the first line of foo may arbitrarily SEGV because fpc_AnsiStr_Decr_Ref will interpret the (possibly) non-nil value (of Result) as an AnsiString which (being a random uninitialized value) will likely be incorrect and blow up. Surely the semantics of string handling relies on FPC guaranteeing automatic variables are always preassigned nil when they come into scope? Put another way, how does fpc_AnsiStr_Decr_Ref and friends, which receive the address of the caller's Result variable via their var parameter know that the value of this parameter (which may not be initialized if what you say is correct) is or is not a valid string? Bruce. On Wed, May 8, 2013 at 5:18 PM, Michael Van Canneyt mich...@freepascal.orgwrote: On Wed, 8 May 2013, Bruce Tulloch wrote: After a random but very long period of time (i.e. very many successful calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref. GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's reference is to be decremented) is nil (i.e. 0x0). Prima facie, that's the reason for the SEGV, but how is it possible that the compiler would pass a nil pointer to this function the first place? To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system executing in a multi-threaded application (which uses python threads and fpc threads). I have not found obvious evidence of memory corruption from other execution contexts or shared memory handling problems. The SEGV occurs when called from a function, let's call it foo, that looks like this: function foo : AnsiString; begin Result := ''; other stuff end; The AnsiString pointer that fpc_AnsiStr_Decr_Ref throws a SEGV is Result, at the first line of the function foo. It appears the compiler is passing Result to fpc_AnsiStr_Decr_Ref even though Result (at this point in the function) must be nil (having only just come into scope). This is not correct. Result is NOT guaranteed to be nil. About a year ago, I was as surprised as you are to discover this, but it is so. It is even so in Delphi. How is is possible that fpc_AnsiStr_Decr_Ref is being called at all? Roughly: What happens is that the caller gives the address of the location where the result must go. The function receives this address, and then treats it as a normal variable, meaning that as soon as it is used, fpc_AnsiStr_Decr_Ref and friends come into play. The exact behaviour also depends on the compiler version. One of the compiler maintainers can describe this in more detail. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
I've not managed to trap it again, but based on the information I have from the last time it occurred I can say the error happened here: --- a/rtl/i386/i386.inc +++ b/rtl/i386/i386.inc @@ -1523,7 +1523,7 @@ movl(%eax),%edx subl$8,%edx // [102] If l^0 then exit; cmpl$0,(%edx) -- SEGV OCCURS HERE jl .Lj3596 .Lj3603: // [104] If declocked(l^) then That is, when testing the string length, the address of the length variable appears to be duff. I don't know what %edx was pointing to at the time (I hope to know next time I trap it) but it was obviously wrong. -b On Thu, May 9, 2013 at 9:33 AM, Bruce Tulloch pas...@causal.com wrote: Thanks Jonas, that confirms what I suspected. Next time I trap an instance of this (rare) fault I will inspect exactly which CPU instruction raised the SEGV inside fpc_AnsiStr_Decr_Ref in search of a source of memory corruption. Bruce. On Wed, May 8, 2013 at 11:49 PM, Jonas Maebe jonas.ma...@elis.ugent.bewrote: On 08 May 2013, at 08:13, Bruce Tulloch wrote: After a random but very long period of time (i.e. very many successful calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref. GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's reference is to be decremented) is nil (i.e. 0x0). Prima facie, that's the reason for the SEGV, but how is it possible that the compiler would pass a nil pointer to this function the first place? The first thing fpc_AnsiStr_Decr_Ref does is check whether its parameter is nil, and if so it immediately exists. It can be nil in case the ansistring contains an empty string. That routine itself also sets its argument to nil in case this was not the case initially (it's a var-parameter), and I assume your crash happens after this has been done. To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system executing in a multi-threaded application (which uses python threads and fpc threads). I have not found obvious evidence of memory corruption from other execution contexts or shared memory handling problems. It's nevertheless most likely memory corruption. You can try compiling with -gv and running your program under valgrind to see whether it finds anything (you will probably get some false positives about certain RTL pchar routines such as strscan and strlen, but you can ignore those). Jonas __**_ fpc-pascal maillist - fpc-pascal@lists.freepascal.**orgfpc-pascal@lists.freepascal.org http://lists.freepascal.org/**mailman/listinfo/fpc-pascalhttp://lists.freepascal.org/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?
So here's some more diagnostic at the point of the SEGV: (gdb) disass Dump of assembler code for function _$SYSTEM$_Ll1637: = 0x0118ace1 +0: cmpl $0x0,(%edx) End of assembler dump. (gdb) i reg eax0xb6c77158 -1228443304 ecx0xb6c76c04 -1228444668 edx0xfff8 -8 ebx0x12adbf819586040 esp0xb6c75f5c 0xb6c75f5c ebp0xb6c75f70 0xb6c75f70 esi0xb6c77020 -1228443616 edi0xb6c77020 -1228443616 eip0x118ace10x118ace1 _$SYSTEM$_Ll1637 eflags 0x210293 [ CF AF SF IF RF ID ] cs 0x73 115 ss 0x7b 123 ds 0x7b 123 es 0x7b 123 fs 0x0 0 gs 0x33 51 (gdb) p $eax^ $4 = 0 This tells me that the test at the top of fpc_AnsiStr_Decr_Ref: cmpl $0,(%eax) jne .Ldecr_ref_continue ret .Ldecr_ref_continue: passed (i.e. (%eax) was NOT nil) but sometime during the execution of the following code: // Temps allocated between ebp-24 and ebp+0 subl$4,%esp // Var S located in register // Var l located in register movl%eax,(%esp) // [101] l:=@PAnsiRec(S-FirstOff)^.Ref; movl(%eax),%edx subl$8,%edx // [102] If l^0 then exit; cmpl$0,(%edx) the variable (%eax) MUST have been changed (to nil) BY ANOTHER THREAD. Is there any other plausible explanation I may have missed? If there is no other explanation, then it means I need to find out how the string variable referred to by (%eax) could have been been accessed (or even known to exist) by any other thread in the same address space. If that variable is local to a function (i.e. foo's Result with SEGV upon its assignment immediately it first comes into scope, per my earlier email) then absent a bug in FPC's handling string references and allocation, it seems impossible that it could be known or referenced by any other other thread. I'm reasonably confident there's no other way it could be overwritten by another thread (i.e. I don't think there are any range or buffer pointer errors anywhere else) so logic tells me I must have the wrong thesis or there's a string handling error in FPC. Any clues or insight, gratefully received :-) Cheers, Bruce. PS: I can't use valgrind in practice for a variety of reasons, not the least of which is that I'm not likely to see the error for an extraordinary long time given that slight changes to the (execution time of the) code made so far have had a dramatic effect on the likelihood of the occurrence of this problem at all but it's clearly some sort of race condition over unprotected memory somewhere. On Thu, May 9, 2013 at 9:47 AM, Bruce Tulloch pas...@causal.com wrote: I've not managed to trap it again, but based on the information I have from the last time it occurred I can say the error happened here: --- a/rtl/i386/i386.inc +++ b/rtl/i386/i386.inc @@ -1523,7 +1523,7 @@ movl(%eax),%edx subl$8,%edx // [102] If l^0 then exit; cmpl$0,(%edx) -- SEGV OCCURS HERE jl .Lj3596 .Lj3603: // [104] If declocked(l^) then That is, when testing the string length, the address of the length variable appears to be duff. I don't know what %edx was pointing to at the time (I hope to know next time I trap it) but it was obviously wrong. -b On Thu, May 9, 2013 at 9:33 AM, Bruce Tulloch pas...@causal.com wrote: Thanks Jonas, that confirms what I suspected. Next time I trap an instance of this (rare) fault I will inspect exactly which CPU instruction raised the SEGV inside fpc_AnsiStr_Decr_Ref in search of a source of memory corruption. Bruce. On Wed, May 8, 2013 at 11:49 PM, Jonas Maebe jonas.ma...@elis.ugent.bewrote: On 08 May 2013, at 08:13, Bruce Tulloch wrote: After a random but very long period of time (i.e. very many successful calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref. GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's reference is to be decremented) is nil (i.e. 0x0). Prima facie, that's the reason for the SEGV, but how is it possible that the compiler would pass a nil pointer to this function the first place? The first thing fpc_AnsiStr_Decr_Ref does is check whether its parameter is nil, and if so it immediately exists. It can be nil in case the ansistring contains an empty string. That routine itself also sets its argument to nil in case this was not the case initially (it's a var-parameter), and I assume your crash happens after this has been done. To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system executing in a multi-threaded application (which uses python threads and fpc threads). I have not found obvious evidence of memory corruption from other execution contexts or shared memory handling problems. It's nevertheless most