if i load the attached module on my host, the link winds up in a curious
state.  the intent of the module is to duplicate a particular type of
kernel hang that blocks all the cpus from handling any work.

what happens is that the sma stops responding:

        # ibportstate  90 1
        ibportstate: iberror: failed: smp query nodeinfo failed

but the switch port on the other end of the link still reports a valid
state:

        # ibportstate  70 18
        PortInfo:
        # Port info: Lid 70 port 18
        LinkState:.......................Active
        PhysLinkState:...................LinkUp
        LinkWidthSupported:..............1X or 4X
        LinkWidthEnabled:................1X or 4X
        LinkWidthActive:.................4X
        LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps
        LinkSpeedEnabled:................2.5 Gbps
        LinkSpeedActive:.................2.5 Gbps
        ibwarn: [6758] _do_madrpc: recv failed: Connection timed out
        ibportstate: iberror: failed: smp query nodeinfo failed

we believe that the link layer is handled entirely in the firmware 
which has no idea that the sma part in the kernel has gone to sleep.
the periodic light sweeps by the opensm dont seem to discover this
problem either.

this type of failure tends to make the ib utilities that scan the network
run rather slowly.  ibdiagnet does indeed spot this broken host, but 
perhaps the sm could be extended to attempt to something about this 
host, like reset the switch port?  should it really require manual
intervention to clear this error?

/* doom.c -- reliably wedge an smp kernel 
 *
 * build:
 *        echo 'obj-m   += doom.o' > Makefile
 *        make -C /lib/modules/`uname -r`/build M=`pwd`
 *
 * usage:
 *        insmod doom.ko
 */

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/spinlock.h>
#include <linux/smp.h>

static void wedge(void *data)
{
        unsigned long flags;
        spinlock_t lock;

        printk(KERN_ERR "goodbye cruel world...\n");

        spin_lock_init(&lock);
        spin_lock_irqsave(&lock, flags);

        while (1)
                /* do nothing */;
}

static int __init doom_init(void)
{
        int i;

        for_each_possible_cpu(i) {
                if (i != smp_processor_id())
                        smp_call_function_single(i, wedge, 0, 0, 0);
        }

        smp_call_function_single(smp_processor_id(), wedge, 0, 0, 0);

        return 0;
}

module_init(doom_init);

MODULE_AUTHOR("chas williams <[EMAIL PROTECTED]>");
MODULE_DESCRIPTION("wedge the kernel but good");
MODULE_LICENSE("GPL");
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to