Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-13 Thread Jonas Maebe


On 10 May 2013, at 03:19, Bruce Tulloch wrote:


The compiler turns such functions into procedures with an implicit
var-parameter
and the *caller* passes the location where the function result  
should go

via that
parameter.


Okay, thanks, that clarifies, now I understand how a variable in the
caller's scope can be affected while making assignments to Result in  
the

callee's scope BEFORE callee has finished executing.

Another way of stating this is; Result is a local variable of a  
function,
initialized to nil and passed by value to the caller upon completion  
ONLY
if Result not a reference to a dynamic type, otherwise it's an  
implicit var

argument with scope beyond that of the function.

Is that correct?


Yes, apart from the fact that result is never initialized to nil.


If so, it would seem to be a bit of semantic trap for the
unwary :-)


Differences in the execution because of the above change can only  
occur in case you have memory corruption. On the other hand, in that  
case anything is possible regardless of what optimisation have or have  
not been performed by the compiler.



Such optimizations only occur in safe situations (e.g., not when
assigning to a
global variable...


Does the compiler consider ANY non-local variable to be global?

For example, fields of an object?


These are indeed global. And so are e.g. local variables whose address  
has been taken, that are used in assembler code, or that have been  
passed to a var-parameter (because the called routine may then have  
stored its address). There are no cases that I know of where the  
compiler can perform that optimisation in an unsafe scenario.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-09 Thread Ludo Brands
On 05/09/2013 05:19 AM, Bruce Tulloch wrote:
 
 This tells me that the test at the top of fpc_AnsiStr_Decr_Ref:
 
 cmpl $0,(%eax)
 jne .Ldecr_ref_continue
 ret
 .Ldecr_ref_continue:
 
 passed (i.e. (%eax) was NOT nil) but sometime during the execution of
 the following code:
 
 // Temps allocated between ebp-24 and ebp+0
 subl$4,%esp
 // Var S located in register
 // Var l located in register
 movl%eax,(%esp)
 // [101] l:=@PAnsiRec mailto:=@PAnsiRec(S-FirstOff)^.Ref;
 movl(%eax),%edx
 subl$8,%edx
 // [102] If l^0 then exit;
 cmpl$0,(%edx)
 
 the variable (%eax) MUST have been changed (to nil) BY ANOTHER THREAD.

 Is there any other plausible explanation I may have missed?


SIGSEGV is caused by an access to any memory outside the process address
space. Not only nil. So the first test only checks if the address is not
nil but will let other, even invalid, addresses pass on.


 If there is no other explanation, then it means I need to find out how
 the string variable referred to by (%eax) could have been been accessed
 (or even known to exist) by any other thread in the same address space.
 
 If that variable is local to a function (i.e. foo's Result with SEGV
 upon its assignment immediately it first comes into scope, per my
 earlier email) then absent a bug in FPC's handling string references and
 allocation, it seems impossible that it could be known or referenced by
 any other other thread.
 
 I'm reasonably confident there's no other way it could be overwritten by
 another thread (i.e. I don't think there are any range or buffer pointer
 errors anywhere else) so logic tells me I must have the wrong thesis or
 there's a string handling error in FPC.
 
 Any clues or insight, gratefully received :-)
 

Result in foo is initialized with the address of the left side variable
in the call to foo. If you have
  s:=foo;
result will point to s. If you just call
  foo;
and drop the result, the compiler will create and use a hidden temp
string variable. Strings are managed types and initialized to nil.

So you are looking at the wrong location for your bug. You should look
at what has corrupted the string variable that receives the result of foo.

Ludo
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-09 Thread José Mejuto

El 09/05/2013 5:19, Bruce Tulloch escribió:


If there is no other explanation, then it means I need to find out how
the string variable referred to by (%eax) could have been been accessed
(or even known to exist) by any other thread in the same address space.-- 


Hello,

In the past I had suffered a problem like yours and the culprit was 
another different function that passes result (string) as a parameter 
when calling a function without initialization, something like this:


function foo(var para: string): string;
begin
  //Something with para
end;

function bar(): string;
begin
  result:=foo(result);
end;

I hope this helps...
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-09 Thread Bruce Tulloch
Thanks Ludo, but I know the value in (%eax) in this case is nil (see the
cpu register dump in my email) because the address of the string length (in
edx) is 0xfff8 (which is 8 less than nil) per the instruction just
before the one that fails with SEGV. The SEGV itself is caused by an
attempt to read the address in edx, i.e. 0xfff8 at the instruction cmpl
$0,(%edx).

The corruption is not occurring when the return value of foo is used, it's
occurring when the Result variable in foo is first assigned (a valid
string, '') when Result first appears in scope of the body of the function
foo.

Thanks for your feedback. Cheers, Bruce.


On Thu, May 9, 2013 at 4:21 PM, Ludo Brands ludo.bra...@free.fr wrote:

 On 05/09/2013 05:19 AM, Bruce Tulloch wrote:
 
  This tells me that the test at the top of fpc_AnsiStr_Decr_Ref:
 
  cmpl $0,(%eax)
  jne .Ldecr_ref_continue
  ret
  .Ldecr_ref_continue:
 
  passed (i.e. (%eax) was NOT nil) but sometime during the execution of
  the following code:
 
  // Temps allocated between ebp-24 and ebp+0
  subl$4,%esp
  // Var S located in register
  // Var l located in register
  movl%eax,(%esp)
  // [101] l:=@PAnsiRec mailto:=@PAnsiRec(S-FirstOff)^.Ref;
  movl(%eax),%edx
  subl$8,%edx
  // [102] If l^0 then exit;
  cmpl$0,(%edx)
 
  the variable (%eax) MUST have been changed (to nil) BY ANOTHER THREAD.
 
  Is there any other plausible explanation I may have missed?
 

 SIGSEGV is caused by an access to any memory outside the process address
 space. Not only nil. So the first test only checks if the address is not
 nil but will let other, even invalid, addresses pass on.


  If there is no other explanation, then it means I need to find out how
  the string variable referred to by (%eax) could have been been accessed
  (or even known to exist) by any other thread in the same address space.
 
  If that variable is local to a function (i.e. foo's Result with SEGV
  upon its assignment immediately it first comes into scope, per my
  earlier email) then absent a bug in FPC's handling string references and
  allocation, it seems impossible that it could be known or referenced by
  any other other thread.
 
  I'm reasonably confident there's no other way it could be overwritten by
  another thread (i.e. I don't think there are any range or buffer pointer
  errors anywhere else) so logic tells me I must have the wrong thesis or
  there's a string handling error in FPC.
 
  Any clues or insight, gratefully received :-)
 

 Result in foo is initialized with the address of the left side variable
 in the call to foo. If you have
   s:=foo;
 result will point to s. If you just call
   foo;
 and drop the result, the compiler will create and use a hidden temp
 string variable. Strings are managed types and initialized to nil.

 So you are looking at the wrong location for your bug. You should look
 at what has corrupted the string variable that receives the result of foo.

 Ludo
 ___
 fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
 http://lists.freepascal.org/mailman/listinfo/fpc-pascal

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-09 Thread Bruce Tulloch
Thanks José, I can see that might cause a problem given bar passes result
by reference to foo without initializing result first. My question to Jonas
or others more knowledgeable than me about what the compiler does, is
whether result (in your example and my own case) is guaranteed to be
initialized to nil when it first appears in scope (i.e. before it's been
assigned any value in our code). If it is initialized to nil, then foo
would receive a reference to bar's result variable (via para) and the value
of that variable would be nil (and all would be okay). If it isn't
initialized to nil, the same rule applies but the value of result (as seen
by foo via para) would likely be invalid and would probably blow up in foo
when dereferenced (as a string).

My problem is similar except that I know it's not nil when passed in
(because the initial test in fpc_AnsiStr_Decr_Ref looking for nil passes)
but that it becomes nil very soon afterward (because the SEGV arises as an
indirect result of it being nil, as I explained in my reply to Ludo just
now).

I'm pretty sure I have a shared memory problem somewhere between threads in
my code but I can't understand how this could be given the erroneously
shared variable appears to be an automatic variable (i.e. Result) that has
just been created on the stack in the function foo that calls
fpc_AnsiStr_Decr_Ref where the SEGV occurs.

I'll keep looking :-) Bruce.


On Thu, May 9, 2013 at 9:48 PM, José Mejuto joshy...@gmail.com wrote:

 El 09/05/2013 5:19, Bruce Tulloch escribió:

  If there is no other explanation, then it means I need to find out how
 the string variable referred to by (%eax) could have been been accessed
 (or even known to exist) by any other thread in the same address space.--


 Hello,

 In the past I had suffered a problem like yours and the culprit was
 another different function that passes result (string) as a parameter when
 calling a function without initialization, something like this:

 function foo(var para: string): string;
 begin
   //Something with para
 end;

 function bar(): string;
 begin
   result:=foo(result);
 end;

 I hope this helps...

 __**_
 fpc-pascal maillist  -  
 fpc-pascal@lists.freepascal.**orgfpc-pascal@lists.freepascal.org
 http://lists.freepascal.org/**mailman/listinfo/fpc-pascalhttp://lists.freepascal.org/mailman/listinfo/fpc-pascal

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-09 Thread Jonas Maebe

On 09 May 2013, at 14:39, Bruce Tulloch wrote:

 Thanks José, I can see that might cause a problem given bar passes result
 by reference to foo without initializing result first. My question to Jonas
 or others more knowledgeable than me about what the compiler does, is
 whether result (in your example and my own case) is guaranteed to be
 initialized to nil when it first appears in scope (i.e. before it's been
 assigned any value in our code).

Every instance of an automated type, whether it was explicitly declared or 
implicitly created as a temp, initially gets the value nil.

However, as Michael and Ludo explained, the result variable of a function 
returning an ansistring/unicodestring is not created inside that function 
itself. The compiler turns such functions into procedures with an implicit 
var-parameter and the *caller* passes the location where the function result 
should go via that parameter. This location can be a temporary location, but 
the compiler can also optimize this by directly passing the location of the 
variable to which you assign the result of that function call. Such 
optimizations only occur in safe situations (e.g., not when assigning to a 
global variable, because otherwise assigning something to the function result 
would immediately change the value of that global variable too), but as Ludo 
explains this means that you are looking in the wrong place for the data race.

So you are probably writing in two threads to whatever you are assigning the 
result of that function to.


Jonas___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-09 Thread Sven Barth

On 09.05.2013 14:39, Bruce Tulloch wrote:

Thanks José, I can see that might cause a problem given bar passes
result by reference to foo without initializing result first. My
question to Jonas or others more knowledgeable than me about what the
compiler does, is whether result (in your example and my own case) is
guaranteed to be initialized to nil when it first appears in scope (i.e.
before it's been assigned any value in our code). If it is initialized
to nil, then foo would receive a reference to bar's result variable (via
para) and the value of that variable would be nil (and all would be
okay). If it isn't initialized to nil, the same rule applies but the
value of result (as seen by foo via para) would likely be invalid and
would probably blow up in foo when dereferenced (as a string).

My problem is similar except that I know it's not nil when passed in
(because the initial test in fpc_AnsiStr_Decr_Ref looking for nil
passes) but that it becomes nil very soon afterward (because the SEGV
arises as an indirect result of it being nil, as I explained in my reply
to Ludo just now).

I'm pretty sure I have a shared memory problem somewhere between threads
in my code but I can't understand how this could be given the
erroneously shared variable appears to be an automatic variable (i.e.
Result) that has just been created on the stack in the function foo that
calls fpc_AnsiStr_Decr_Ref where the SEGV occurs.

I'll keep looking :-) Bruce.


Do you play around with pointers anywhere? I once had it that I 
overwrote something in a parent stackframe, so maybe you could by 
accident access the memory location of the Result variable...


Regards,
Sven

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-09 Thread Bruce Tulloch
 The compiler turns such functions into procedures with an implicit
var-parameter
 and the *caller* passes the location where the function result should go
via that
 parameter.

Okay, thanks, that clarifies, now I understand how a variable in the
caller's scope can be affected while making assignments to Result in the
callee's scope BEFORE callee has finished executing.

Another way of stating this is; Result is a local variable of a function,
initialized to nil and passed by value to the caller upon completion ONLY
if Result not a reference to a dynamic type, otherwise it's an implicit var
argument with scope beyond that of the function.

Is that correct? If so, it would seem to be a bit of semantic trap for the
unwary :-)

 Such optimizations only occur in safe situations (e.g., not when
assigning to a
 global variable...

Does the compiler consider ANY non-local variable to be global?

For example, fields of an object?

 So you are probably writing in two threads to whatever you are assigning
the
 result of that function to.

Yep, makes sense, we will look carefully to see if that's what we're doing.

The functions concerned are actually methods of the TBlockSocket class of
the synapse library. We use an instance of this class in two threads; one
sending, the other receiving.

These threads have full shared memory protection in our own code but having
a look at the TBlockSocket implementation I can see at least one suspect;
FLastErrorDesc.

This field is changed by methods that send and receive on the socket which
means it's assigned values in the context of two different threads (given
our usage). Indeed it suggests TBlockSocket is not thread safe as currently
coded. Looks like a smoking gun to me.

Thanks one and all for all your helpful feedback!

Bruce.



On Thu, May 9, 2013 at 10:55 PM, Jonas Maebe jonas.ma...@elis.ugent.bewrote:


 On 09 May 2013, at 14:39, Bruce Tulloch wrote:

  Thanks José, I can see that might cause a problem given bar passes result
  by reference to foo without initializing result first. My question to
 Jonas
  or others more knowledgeable than me about what the compiler does, is
  whether result (in your example and my own case) is guaranteed to be
  initialized to nil when it first appears in scope (i.e. before it's been
  assigned any value in our code).

 Every instance of an automated type, whether it was explicitly declared or
 implicitly created as a temp, initially gets the value nil.

 However, as Michael and Ludo explained, the result variable of a
 function returning an ansistring/unicodestring is not created inside that
 function itself. The compiler turns such functions into procedures with an
 implicit var-parameter and the *caller* passes the location where the
 function result should go via that parameter. This location can be a
 temporary location, but the compiler can also optimize this by directly
 passing the location of the variable to which you assign the result of that
 function call. Such optimizations only occur in safe situations (e.g., not
 when assigning to a global variable, because otherwise assigning something
 to the function result would immediately change the value of that global
 variable too), but as Ludo explains this means that you are looking in the
 wrong place for the data race.

 So you are probably writing in two threads to whatever you are assigning
 the result of that function to.


 Jonas___
 fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
 http://lists.freepascal.org/mailman/listinfo/fpc-pascal

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

[fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-08 Thread Bruce Tulloch
 After a random but very long period of time (i.e. very many successful
calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's
reference is to be decremented) is nil (i.e. 0x0).

Prima facie, that's the reason for the SEGV, but how is it possible that
the compiler would pass a nil pointer to this function the first place?

To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system
executing in a multi-threaded application (which uses python threads and
fpc threads). I have not found obvious evidence of memory corruption from
other execution contexts or shared memory handling problems.

The SEGV occurs when called from a function, let's call it foo, that looks
like this:

function foo : AnsiString;
begin
  Result := '';
 other stuff
end;

The AnsiString pointer that fpc_AnsiStr_Decr_Ref throws a SEGV is Result,
at the first line of the function foo.

It appears the compiler is passing Result to fpc_AnsiStr_Decr_Ref even
though Result (at this point in the function) must be nil (having only just
come into scope).

How is is possible that fpc_AnsiStr_Decr_Ref is being called at all?

 Any/all advice gratefully received.

Cheers, Bruce.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-08 Thread Michael Van Canneyt



On Wed, 8 May 2013, Bruce Tulloch wrote:


 After a random but very long period of time (i.e. very many successful calls) 
I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's reference is 
to be decremented) is nil (i.e. 0x0).

Prima facie, that's the reason for the SEGV, but how is it possible that the 
compiler would pass a nil pointer to this function the first place?

To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system 
executing in a multi-threaded application (which uses python threads and fpc 
threads). I have not found obvious
evidence of memory corruption from other execution contexts or shared memory 
handling problems.

The SEGV occurs when called from a function, let's call it foo, that looks like 
this:

function foo : AnsiString;
begin
  Result := '';
 other stuff
end;

The AnsiString pointer that fpc_AnsiStr_Decr_Ref throws a SEGV is Result, at 
the first line of the function foo.

It appears the compiler is passing Result to fpc_AnsiStr_Decr_Ref even though 
Result (at this point in the function) must be nil (having only just come into 
scope).


This is not correct. Result is NOT guaranteed to be nil.

About a year ago,  I was as surprised as you are to discover this, but it is so.
It is even so in Delphi.


How is is possible that fpc_AnsiStr_Decr_Ref is being called at all?


Roughly:

What happens is that the caller gives the address of the location where the 
result must go.
The function receives this address, and then treats it as a normal variable, meaning that 
as soon as it is used,  fpc_AnsiStr_Decr_Ref and friends come into play.


The exact behaviour also depends on the compiler version.

One of the compiler maintainers can describe this in more detail.

Michael.___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-08 Thread Jonas Maebe


On 08 May 2013, at 08:13, Bruce Tulloch wrote:


After a random but very long period of time (i.e. very many successful
calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's
reference is to be decremented) is nil (i.e. 0x0).

Prima facie, that's the reason for the SEGV, but how is it possible  
that
the compiler would pass a nil pointer to this function the first  
place?


The first thing fpc_AnsiStr_Decr_Ref does is check whether its  
parameter is nil, and if so it immediately exists. It can be nil in  
case the ansistring contains an empty string.


That routine itself also sets its argument to nil in case this was not  
the case initially (it's a var-parameter), and I assume your crash  
happens after this has been done.


To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux  
system
executing in a multi-threaded application (which uses python threads  
and
fpc threads). I have not found obvious evidence of memory corruption  
from

other execution contexts or shared memory handling problems.


It's nevertheless most likely memory corruption. You can try compiling  
with -gv and running your program under valgrind to see whether it  
finds anything (you will probably get some false positives about  
certain RTL pchar routines such as strscan and strlen, but you can  
ignore those).



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-08 Thread Bruce Tulloch
Thanks Jonas, that confirms what I suspected. Next time I trap an instance
of this (rare) fault I will inspect exactly which CPU instruction raised
the SEGV inside fpc_AnsiStr_Decr_Ref in search of a source of memory
corruption.


Bruce.


On Wed, May 8, 2013 at 11:49 PM, Jonas Maebe jonas.ma...@elis.ugent.bewrote:


 On 08 May 2013, at 08:13, Bruce Tulloch wrote:

  After a random but very long period of time (i.e. very many successful
 calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

 GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's
 reference is to be decremented) is nil (i.e. 0x0).

 Prima facie, that's the reason for the SEGV, but how is it possible that
 the compiler would pass a nil pointer to this function the first place?


 The first thing fpc_AnsiStr_Decr_Ref does is check whether its parameter
 is nil, and if so it immediately exists. It can be nil in case the
 ansistring contains an empty string.

 That routine itself also sets its argument to nil in case this was not the
 case initially (it's a var-parameter), and I assume your crash happens
 after this has been done.


  To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system
 executing in a multi-threaded application (which uses python threads and
 fpc threads). I have not found obvious evidence of memory corruption from
 other execution contexts or shared memory handling problems.


 It's nevertheless most likely memory corruption. You can try compiling
 with -gv and running your program under valgrind to see whether it finds
 anything (you will probably get some false positives about certain RTL
 pchar routines such as strscan and strlen, but you can ignore those).


 Jonas
 __**_
 fpc-pascal maillist  -  
 fpc-pascal@lists.freepascal.**orgfpc-pascal@lists.freepascal.org
 http://lists.freepascal.org/**mailman/listinfo/fpc-pascalhttp://lists.freepascal.org/mailman/listinfo/fpc-pascal

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-08 Thread Bruce Tulloch
Michael, thanks for your feedback.

One thing that confuses me in light of Jonas' reply, if what you say is
correct (that local variables that have just come into scope are not
guaranteed to be nil) then assignment of Result := ''; at the first line of
foo may arbitrarily SEGV because fpc_AnsiStr_Decr_Ref will interpret the
(possibly) non-nil value (of Result) as an AnsiString which (being a random
uninitialized value) will likely be incorrect and blow up.

Surely the semantics of string handling relies on FPC guaranteeing
automatic variables are always preassigned nil when they come into scope?

Put another way, how does fpc_AnsiStr_Decr_Ref and friends, which receive
the address of the caller's Result variable via their var parameter know
that the value of this parameter (which may not be initialized if what you
say is correct) is or is not a valid string?

Bruce.

On Wed, May 8, 2013 at 5:18 PM, Michael Van Canneyt
mich...@freepascal.orgwrote:



 On Wed, 8 May 2013, Bruce Tulloch wrote:

   After a random but very long period of time (i.e. very many successful
 calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

 GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's
 reference is to be decremented) is nil (i.e. 0x0).

 Prima facie, that's the reason for the SEGV, but how is it possible that
 the compiler would pass a nil pointer to this function the first place?

 To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system
 executing in a multi-threaded application (which uses python threads and
 fpc threads). I have not found obvious
 evidence of memory corruption from other execution contexts or shared
 memory handling problems.

 The SEGV occurs when called from a function, let's call it foo, that
 looks like this:

 function foo : AnsiString;
 begin
   Result := '';
  other stuff
 end;

 The AnsiString pointer that fpc_AnsiStr_Decr_Ref throws a SEGV is Result,
 at the first line of the function foo.

 It appears the compiler is passing Result to fpc_AnsiStr_Decr_Ref even
 though Result (at this point in the function) must be nil (having only just
 come into scope).


 This is not correct. Result is NOT guaranteed to be nil.

 About a year ago,  I was as surprised as you are to discover this, but it
 is so.
 It is even so in Delphi.


  How is is possible that fpc_AnsiStr_Decr_Ref is being called at all?


 Roughly:

 What happens is that the caller gives the address of the location where
 the result must go.
 The function receives this address, and then treats it as a normal
 variable, meaning that as soon as it is used,  fpc_AnsiStr_Decr_Ref and
 friends come into play.

 The exact behaviour also depends on the compiler version.

 One of the compiler maintainers can describe this in more detail.

 Michael.
 ___
 fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
 http://lists.freepascal.org/mailman/listinfo/fpc-pascal

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-08 Thread Bruce Tulloch
I've not managed to trap it again, but based on the information I have from
the last time it occurred I can say the error happened here:

--- a/rtl/i386/i386.inc
+++ b/rtl/i386/i386.inc
@@ -1523,7 +1523,7 @@
 movl(%eax),%edx
 subl$8,%edx
 // [102] If l^0 then exit;
 cmpl$0,(%edx) -- SEGV OCCURS HERE
 jl  .Lj3596
 .Lj3603:
 // [104] If declocked(l^) then

That is, when testing the string length, the address of the length variable
appears to be duff.

I don't know what %edx was pointing to at the time (I hope to know next
time I trap it) but it was obviously wrong.

-b


On Thu, May 9, 2013 at 9:33 AM, Bruce Tulloch pas...@causal.com wrote:

 Thanks Jonas, that confirms what I suspected. Next time I trap an instance
 of this (rare) fault I will inspect exactly which CPU instruction raised
 the SEGV inside fpc_AnsiStr_Decr_Ref in search of a source of memory
 corruption.


 Bruce.


 On Wed, May 8, 2013 at 11:49 PM, Jonas Maebe jonas.ma...@elis.ugent.bewrote:


 On 08 May 2013, at 08:13, Bruce Tulloch wrote:

  After a random but very long period of time (i.e. very many successful
 calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

 GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's
 reference is to be decremented) is nil (i.e. 0x0).

 Prima facie, that's the reason for the SEGV, but how is it possible that
 the compiler would pass a nil pointer to this function the first place?


 The first thing fpc_AnsiStr_Decr_Ref does is check whether its parameter
 is nil, and if so it immediately exists. It can be nil in case the
 ansistring contains an empty string.

 That routine itself also sets its argument to nil in case this was not
 the case initially (it's a var-parameter), and I assume your crash happens
 after this has been done.


  To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system
 executing in a multi-threaded application (which uses python threads and
 fpc threads). I have not found obvious evidence of memory corruption from
 other execution contexts or shared memory handling problems.


 It's nevertheless most likely memory corruption. You can try compiling
 with -gv and running your program under valgrind to see whether it finds
 anything (you will probably get some false positives about certain RTL
 pchar routines such as strscan and strlen, but you can ignore those).


 Jonas
 __**_
 fpc-pascal maillist  -  
 fpc-pascal@lists.freepascal.**orgfpc-pascal@lists.freepascal.org
 http://lists.freepascal.org/**mailman/listinfo/fpc-pascalhttp://lists.freepascal.org/mailman/listinfo/fpc-pascal



___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC 2.6.2 throws SEGV in fpc_AnsiStr_Decr_Ref(). How is this possible?

2013-05-08 Thread Bruce Tulloch
So here's some more diagnostic at the point of the SEGV:

(gdb) disass
Dump of assembler code for function _$SYSTEM$_Ll1637:
= 0x0118ace1 +0: cmpl   $0x0,(%edx)
End of assembler dump.
(gdb) i reg
eax0xb6c77158   -1228443304
ecx0xb6c76c04   -1228444668
edx0xfff8   -8
ebx0x12adbf819586040
esp0xb6c75f5c   0xb6c75f5c
ebp0xb6c75f70   0xb6c75f70
esi0xb6c77020   -1228443616
edi0xb6c77020   -1228443616
eip0x118ace10x118ace1 _$SYSTEM$_Ll1637
eflags 0x210293 [ CF AF SF IF RF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0  0
gs 0x33 51
(gdb) p $eax^
$4 = 0

This tells me that the test at the top of fpc_AnsiStr_Decr_Ref:

cmpl $0,(%eax)
jne .Ldecr_ref_continue
ret
.Ldecr_ref_continue:

passed (i.e. (%eax) was NOT nil) but sometime during the execution of the
following code:

// Temps allocated between ebp-24 and ebp+0
subl$4,%esp
// Var S located in register
// Var l located in register
movl%eax,(%esp)
// [101] l:=@PAnsiRec(S-FirstOff)^.Ref;
movl(%eax),%edx
subl$8,%edx
// [102] If l^0 then exit;
cmpl$0,(%edx)

the variable (%eax) MUST have been changed (to nil) BY ANOTHER THREAD.

Is there any other plausible explanation I may have missed?

If there is no other explanation, then it means I need to find out how the
string variable referred to by (%eax) could have been been accessed (or
even known to exist) by any other thread in the same address space.

If that variable is local to a function (i.e. foo's Result with SEGV upon
its assignment immediately it first comes into scope, per my earlier email)
then absent a bug in FPC's handling string references and allocation, it
seems impossible that it could be known or referenced by any other other
thread.

I'm reasonably confident there's no other way it could be overwritten by
another thread (i.e. I don't think there are any range or buffer pointer
errors anywhere else) so logic tells me I must have the wrong thesis or
there's a string handling error in FPC.

Any clues or insight, gratefully received :-)

Cheers, Bruce.

PS: I can't use valgrind in practice for a variety of reasons, not the
least of which is that I'm not likely to see the error for an extraordinary
long time given that slight changes to the (execution time of the) code
made so far have had a dramatic effect on the likelihood of the occurrence
of this problem at all but it's clearly some sort of race condition over
unprotected memory somewhere.



On Thu, May 9, 2013 at 9:47 AM, Bruce Tulloch pas...@causal.com wrote:

 I've not managed to trap it again, but based on the information I have
 from the last time it occurred I can say the error happened here:

 --- a/rtl/i386/i386.inc
 +++ b/rtl/i386/i386.inc
 @@ -1523,7 +1523,7 @@
  movl(%eax),%edx
  subl$8,%edx
  // [102] If l^0 then exit;
  cmpl$0,(%edx) -- SEGV OCCURS HERE
  jl  .Lj3596
  .Lj3603:
  // [104] If declocked(l^) then

 That is, when testing the string length, the address of the length
 variable appears to be duff.

 I don't know what %edx was pointing to at the time (I hope to know next
 time I trap it) but it was obviously wrong.

 -b


 On Thu, May 9, 2013 at 9:33 AM, Bruce Tulloch pas...@causal.com wrote:

 Thanks Jonas, that confirms what I suspected. Next time I trap an
 instance of this (rare) fault I will inspect exactly which CPU instruction
 raised the SEGV inside fpc_AnsiStr_Decr_Ref in search of a source of memory
 corruption.


 Bruce.


 On Wed, May 8, 2013 at 11:49 PM, Jonas Maebe 
 jonas.ma...@elis.ugent.bewrote:


 On 08 May 2013, at 08:13, Bruce Tulloch wrote:

  After a random but very long period of time (i.e. very many successful
 calls) I get a SEGV in the built-in function fpc_AnsiStr_Decr_Ref.

 GDB reports the argument to fpc_AnsiStr_Decr_Ref (the string who's
 reference is to be decremented) is nil (i.e. 0x0).

 Prima facie, that's the reason for the SEGV, but how is it possible that
 the compiler would pass a nil pointer to this function the first place?


 The first thing fpc_AnsiStr_Decr_Ref does is check whether its parameter
 is nil, and if so it immediately exists. It can be nil in case the
 ansistring contains an empty string.

 That routine itself also sets its argument to nil in case this was not
 the case initially (it's a var-parameter), and I assume your crash happens
 after this has been done.


  To put this into context, I'm running FPC 2.6.2 on a 32 bit Linux system
 executing in a multi-threaded application (which uses python threads and
 fpc threads). I have not found obvious evidence of memory corruption
 from
 other execution contexts or shared memory handling problems.


 It's nevertheless most