Wow - thanks to all who responded to my query so quickly!
It looks as though increasing the size of the kernel variable "physmem"
will increase the proper memory pool and thus make Ultriz and AFS happy.

Again, thanks to all who responded (messages included below).

Pat



I wrote:
> We've been seeing machine hangs every once and awhile (twice in the last
> month on a very heavily used machine) on our DEC 5000/200 (Ultrix 4.4, AFS 3.3
> client) - the machine locks up and prints "cant get mbufs" to the console

> until it's rebooted.  It could be _lots_ of things, obviously (since we
> can't do anything once it's hung, though, it's hard to tell)...  Has anyone
> else seen this sort of behavior?

I received the following replies:

From: Walter Wong <[EMAIL PROTECTED]>

        There could be two problems:

        (1) Your users are not unlog'ing when they log out. This results in a
        memory leak in the kernel and so the machine will crash or hang after
        a while. We ran into this under 4.2 and have a program that will go
        through and clean up this stuff. This may require you to use the
        '.krb' AFS programs.

        Anyway for this to occur, you will have to have a good number of
        people logging in and out between reboots.

        (2) There is a patch from DEC that sets a GUARDPAGES boundary which
        limits the amount of allocatable kernel memory (or something like
        that). The magic patch number is 7bxb35390 (though this may be for
        4.3a and not 4.4). However, if you find one, you should find the
        other.

        I'd start by looking for the kernel patch unless it is very clear that
        what is happening is #1.  We had a real hard time getting the patch. I
        heard that it would become a 'generally released patch soon' a few
        months ago, but who knows? If you run into problems, let me know and I
        can try yelling at some more people.


From: Gerhard Gonter <[EMAIL PROTECTED]>

        Hi, we had a problem with an Ultrix machine hanging frequently,
        especially when AFS was loaded.  Ours did not even bring any message.

        The problem was, as it turned out, that physmem was set to a much to
        small size in the kernel configuration.  After setting that value to
        the actual amount of memory, everything was fine.

        I hope that helps a little bit.

        +gg


From: Lynne Cohen Duncan <[EMAIL PROTECTED]>

        I had this problem in Ultrix 4.2a.  Here's what worked then:

        >From DEC:
        ********************************************************************
        "cant get mbufs" means you ran out of km_alloc map space, or you
        ran short of physical memory (temporarily) and thus, dropped an
        input packet.

        You probably need to increase the km_alloc map on the system
        experiencing the problem.   Adjust the value of KMEMUMAP in the 
        sys/machine/vax/vmparam.h file.  [we modified sys/machine/mips/vmparam.h]

        Background:

        The message "cant get mbufs" is printed to the console by the kernel 
        when a call to kmalloc has failed.  A kmalloc call can fail if there are 
        not enough pte's [process table entries] available to map a given region 
        of memory, or it can fail if there are pte's available but the physical 
        memory is lacking at that particular instant.  A number of calls to the 
        kmalloc kernel routing specify that the call is to wait for resources to 
        become available should there be a temporary shortage.  The call in
        question, which results in the "cant get mbufs" message does not specify 
        that a wait should take place.  Thus, the kmalloc routine returns 
        immediately with an error message, the kernel prints a message 
        indicating the unavailability of resources, and the network packet is 
        discarded.

        This most likely occurs during a network "spike" whereby a host is 
        flooded with hundreds of incoming packets during a brief instant, and it 
        is unable to process all of the packets.

        If this happens only randomly and without much frequency, you can ignore 
        it.  If the machine hangs or locks up following the message, or if you 
        see this message often, you should increases the value of KMEMUMAP.
        *******************************************************************
        KMEMUMAP is dependant on the value of PHYSMEM, which is set to 32 on
        our servers.  The code in the vmparam.h file that specifies KMEMUMAP
        is as follows:

        #if (PHYSMEM/10 < LOW_WATER)
        #define KMEMUMAP (LOW_WATER*256+KMEMSLOP)
        #else
        #if (PHYSMEM/10 > HIGH_WATER)
        #define KMEMUMAP (HIGH_WATER*256+KMEMSLOP)
        #else
        #define KMEMUMAP (PHYSMEM/10*256+KMEMSLOP)
        #endif
        #endif

        I changed it to be just the following:

        #define KMEMUMAP (HIGH_WATER*256+KMEMSLOP)

        This triples the actual value.


From: [EMAIL PROTECTED] (perry l morgan)

        Yes, but on Ultrix 4.3.

        This was Transarc Solution:

        > AFS allocates kernel memory for internal data structures, and at
        > several AFS sites, AFS has tried to allocate memory which is available,
        > technically, but which Ultrix thinks is not available, which results
        > in a system hang.  AFS allocates memory from a particular memory pool
        > in the kernel.  The size of that memory pool is calculated by Ultrix
        > based on the system parameter "physmem". Therefore, the solution in
        > these situations is to increase the size of physmem, even to values
        > much larger than the actual memory on the machine.  In our testing at
        > Transarc, we found that setting this value to 2 times (or more) the
        > size of physical memory provided a reasonable amount of kernel memory
        > for AFS data structures, and alleviated system hangs.

        As for the results, it did help.   I have other Ultrix things I'm
        fighting now.

From: Kevin Hildebrand <[EMAIL PROTECTED]>

        We had major problems here getting AFS to run on our Ultrix 4.4
        machines- sometimes the machines would hang with the message you
        mentioned above, and sometimes they'd just hang with no messages at
        all.  After many mail messages back and forth to Transarc, we found
        out two things that needed to be done.

        1) Rebuild the kernel such that the following parameters are adjusted:

        1. Increase physmem to be 2 times physical memory
        2. Increase maxusers to be 2-4 times the current value

        2) There is an ultrix 4.4 patch that solved most of the rest of the
        hangs- it is also related to problems with kernel memory allocation.

        I include below the notes I got from Transarc about the kernel memory
        problem- they talk about Ultrix 4.3 but they are applicable to 4.4 as
        well.  I also include the description and patch number for 4.4.

        These changes have fixed 99% of our hang problems, and I suspect the
        remaining problems we have are not related to AFS...


Reply via email to