Dave,
 
I don't think there is a misunderstanding on our part.  A simple test to determine whether the memory leak is in the LiS stack or in our code would be for us to turn off the MSG_TRACE define in LiS.  If we undefine MSG_TRACE and the memory leak goes away, then that would say that there is a bug in the code contained within MSG_TRACE.  If the problem does not go away, then that would imply there is a problem with our code.  So, we undefined MSG_TRACE and poof, the memory leak is gone.  So that implies that there is something in the code defined by MSG_TRACE that isn't right.
 
Also, there is clearly some confusion in the README.DEBUG file regarding this issue.  If the lis_mem_head contains only memory that has been allocated and not freed, then why would you ever propose to diff the output of streams -m at time 't' and time 't+n'?  Diff'ing these 2 files makes no sense because the memory that was allocated at time 't' may have been freed by time 't+n', so if you diff'ed the files you would get a concatenation of the 2 files when all you want is what is in the file at time 't+n'.  The only thing diff'ing the 2 files would get you is it would remove any memory that was still allocated but not freed at both time 't' and time 't+n'.  But couldn't that be part of the leak?  So, diff'ing the files in this case would only serve to mask the problem.  The only way this diff'ing suggestion in README.DEBUG makes sense is if streams -m shows all the memory that has ever been allocated.
 
And why would you ever mark memory as 'free' but leave it on lis_mem_head?  That is exactly what happens in freehdr.  You do not free the message header until you have more that 10 message headers allocated.  Instead you just mark that memory as 'free' and put it on the lis_mdbfreelist.  But, it doesn't look like it was ever taken off the lis_mem_head.  That looks like a no-no to me.
 
I don't know if you can re-create the problem that we are having, but we are more than willing to help you debug this if you want us to.  I'm not sure what we could send you that would help, but we will do what we can to help.  We have a customer who requested a STREAMS driver under Linux and we want to provide them with the best possible product that we can.  I do not want to have to tell them to edit a source file and recompile LiS to undefine MSG_TRACE for every system they want to ship because they will want to know why, and we will have to tell them that we believe that there are some problems with LiS, and then that is going to cast a shadow over LiS in particular and Linux in general.  And I don't think any of us wants any aspersions cast on either.  I was thrilled when Alex told me about LiS because I have been programming with STREAMS for years under Solaris and I always wanted a version to use under Linux.  I am also very appreciative for the lengths that you have gone to to provide debugging tools for streams.  These kinds of things are sorely lacking under other operating systems.  So, if we can work together to find these problems then everyone who wants to use LiS and Linux will benefit.
 
As a last note, I'm not sure this was stated before but this code is being ported from our Solaris code.  We have had this code for a few years and it is stable under Solaris.  There is not a huge difference between this code and our Solaris code.  As a matter of fact, the STREAMS portion has not been touched at all for this port.  The only differences are the usual platform dependencies concerning how you get called to initialize a dynamically loaded driver, finding PCI hardware, mapping hardware registers, DMA from/to PCI space, etc.  So it seems unlikely that this would work well under Solaris and not at all under Linux if there wasn't some problems with LiS.
 
Paul
-----Original Message-----
From: David Grothe [mailto:[EMAIL PROTECTED]]
Sent: Friday, October 29, 1999 5:54 AM
To: [EMAIL PROTECTED]
Cc: lis-list; Paul Stillwell
Subject: Re: LiS memory debugging concerns

Alex:

There is definitely some kind of misunderstanding here.

The following code, in strmdbg.c lis_free:

p = (mem_link_t *) ptr ;
p-- ;                                       /* go back to link structure */

lis_mem_alloced -= p->size ;                /* keep track of memory */
LisDownCounter(MEMALLOCD, p->size) ;        /* stats array */

p->prev->next = p->next ;                   /* prev elt links around us */
p->next->prev = p->prev ;                   /* next elt links around us */
p->next       = NULL ;                      /* clobber our links */
p->prev       = NULL ;

Delinks freed memory from the list.  The lis_mem_head is of type mem_link_t and is automatically updated by the link manipulations even though not mentioned by name.  The list does _not_ grow without bound.

As concerns lis_max_mem, C language semantics say that a global variable always has the initial value of 0, not some random garbage.  If you find some non-zero value in that location it is because something, intentionally or unintentionally, stored a value there.  Not because of random memory state at load time.

I think it would be a good idea for you to investigate your test and drivers a bit more to see if you have a wild pointer somewhere that is overwriting lis_max_mem.

-- Dave

Alex Chamberlain wrote:

This is partly a follow-up to the message titled "LiS memory problems?" that
my colleague Paul posted to the list a few days ago.  After close study of
the memory management code in head.c and strmdbg.c we have some concerns
with the allocation tracking scheme, the way it is configured, and the way
it is described in README.DEBUG.

1.  It would be nice if the configuration script allowed you to change
whether the symbol MSG_TRACE is defined, and also if the default were to
have MSG_TRACE off, for reasons to follow.

2.  When MSG_TRACE is defined, the queue formed on lis_mem_head grows
without bound (except as described in 3) since it keeps track of all the
memory ever allocated by LiS---not "all in-use allocated memory areas" as
stated in README.DEBUG, which implies to us the memory _currently_
allocated, not all the memory _ever_ allocated.

3.  The biggest problem is that the lis_mem_head queue's growth is only
limited by lis_max_mem, which is set by the (undocumented) -C option to
streams.c.  If the -C option is not used, then lis_max_mem's value is
undefined (since it is never set to a default value), and thus the number of
allocations allowed by the debug allocator is limited nondeterministically.

We discovered these problems when we had a test running continuously,
passing many small messages (100 or so per second) on a small, slow test
machine (Pentium 100 with 32 Mb).  With MSG_TRACE not defined, the test ran
fine.  With MSG_TRACE defined, the test died as soon as we hit lis_max_mem
(whatever it happened to be on that boot).

To summarize, our main complaint is that MSG_TRACE is on by default (in LiS
2.5) but nowhere in the documentation does it say "The MSG_TRACE symbol, if
defined, is helpful when debugging memory problems, but should not be used
in production code since it will cause streams to break quickly in
continuous operation".  Instead we spent several days discovering this for
ourselves.  Not to look a gift horse in the mouth, but our fear is that this
will reflect badly on LiS.  This bug might encourage people to think that
LiS has memory leakage problems and is generally unstable, since this is the
default behavior, when in fact when compiled without debug options it seems
quite reliable.

---
Alex Chamberlain
[EMAIL PROTECTED]
Polaris Communications Inc.
"Products that unite the data center"

----------------------- *THE_LIST_HAS_MOVED* -------------------------------
It is now hosted at gsyc.escet.urjc.es.
To (un)subscribe send a mail to <[EMAIL PROTECTED]>
To send a contribution  send a mail to <[EMAIL PROTECTED]>
Web archives for lists can be found at http://gsyc.escet.urjc.es/lists

Reply via email to