Good to know. The system was a RHEL 5 based system. -----Original Message----- From: Ralph Campbell [mailto:[EMAIL PROTECTED] Sent: Monday, November 26, 2007 1:35 PM To: Robert Pearson Cc: [EMAIL PROTECTED]; 'Arthur Jones' Subject: Re: [ofa-general] ipath crash
2.6.18 has a bug in the vmalloc_user() code which causes this. The thing to do is use a new version of the kernel (2.6.20+ I think). On Mon, 2007-11-26 at 11:37 -0600, Robert Pearson wrote: > Here is the right crash > > > > ----------- [cut here ] --------- [please bite here ] --------- > > Kernel BUG at mm/slab.c:2649 > > invalid opcode: 0000 [1] SMP > > last sysfs file: /class/infiniband/ipath0/node_type > > CPU 7 > > Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc > rdma_ucm(U) ib_srp(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_add > > r(U) ib_uverbs(U) ib_umad(U) ib_mthca(U) ib_ipoib(U) ib_cm(U) ib_sa(U) > ib_mad(U) ip_conntrack_netbios_ns ipt_REJECT xt_s > > tate ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT > xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 dm_m > > irror dm_mod video sbs i2c_ec i2c_core button battery asus_acpi > acpi_memhotplug ac parport_pc lp parport sg ib_ipath(U) > > ide_cd ib_core(U) serio_raw cdrom bnx2 shpchp pcspkr mptsas mptscsih > mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd > > ehci_hcd ohci_hcd uhci_hcd > > Pid: 8101, comm: fragment Not tainted 2.6.18-8.1.15.el5 #1 > > RIP: 0010:[<ffffffff80016ebb>] [<ffffffff80016ebb>] cache_grow > +0x1e/0x395 > > RSP: 0018:ffff810010c3dcb8 EFLAGS: 00010006 > > RAX: 0000000000000000 RBX: 00000000000080d0 RCX: 00000000ffffffff > > RDX: 0000000000000000 RSI: 00000000000080d0 RDI: ffff810037ff43c0 > > RBP: ffff81003ffa06e0 R08: ffff8100020bc280 R09: ffff810037e64400 > > R10: ffff810010c3de68 R11: 000000000000555c R12: ffff810037ff43c0 > > R13: ffff81003ffa06c0 R14: 0000000000000000 R15: ffff810037ff43c0 > > FS: 00002aaaaaad7440(0000) GS:ffff8100020bf340(0000) > knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: 00002aaaaaaac000 CR3: 0000000011a7f000 CR4: 00000000000006e0 > > Process fragment (pid: 8101, threadinfo ffff810010c3c000, task > ffff81002cdd3820) > > Stack: 0000000000000000 0000000000000001 0000000000000296 > 0000000000000001 > > ffff810010c3dd18 00000000ffffffff ffff81003ffa06e0 ffff8100020bc280 > > ffff81003ffa06c0 000000000000000c ffff810037ff43c0 ffffffff8005a5ce > > Call Trace: > > [<ffffffff8005a5ce>] cache_alloc_refill+0x136/0x186 > > [<ffffffff800cc5dc>] kmem_cache_alloc_node+0x98/0xb2 > > [<ffffffff800c2ae8>] __vmalloc_area_node+0x62/0x153 > > [<ffffffff800c2e36>] vmalloc_user+0x15/0x50 > > [<ffffffff88180579>] :ib_ipath:ipath_create_cq+0x67/0x1d6 > > [<ffffffff80062126>] __down_write_nested+0x12/0x92 > > [<ffffffff884266cd>] :ib_uverbs:ib_uverbs_create_cq+0x143/0x259 > > [<ffffffff884231ce>] :ib_uverbs:ib_uverbs_write+0x93/0xa9 > > [<ffffffff8011a55d>] selinux_file_permission+0x9f/0xb6 > > [<ffffffff80016122>] vfs_write+0xce/0x174 > > [<ffffffff800169b3>] sys_write+0x45/0x6e > > [<ffffffff8005b349>] tracesys+0xd1/0xdc > > > > The last one was from an older crash that I picked up by mistake. > > > > Bob > > > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
