I never uninstalled it (i still use some of the tools in it) Faultmond is a service, just chkconfig it off.
Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 On Oct 13, 2008, at 11:03 AM, Malcolm Cowe wrote: > Brock Palen wrote: >> >> I know you say the only addition was the RDAC for the MDS's I >> assume (we use it also just fine). > Yes, the MDS's share a STK 6140. >> When I ran faultmond from suns dcmu rpm (RHEL 4 here) the x4500's >> would crash like clock work ~48 hours. For a very simple bit of >> code I was surpised that once when I forgot to turn it on when >> working on the load this would happen. Just FYI it was unrelated >> to lustre (using provided rpm's no kernel build) this solved my >> problem on the x4500 > The DCMU RPM is installed. I didn't explicitly install this, so it > must have been bundled in with the SIA CD... I'll try removing the > rpm to see what happens. Thanks for the heads up. > > Regards, > > Malcolm. > >> Brock Palen www.umich.edu/~brockp Center for Advanced Computing >> [EMAIL PROTECTED] (734)936-1985 On Oct 13, 2008, at 4:41 AM, >> Malcolm Cowe wrote: >>> >>> The X4200m2 MDS systems and the X4500 OSS were rebuilt using the >>> stock Lustre packages (Kernel + modules + userspace). With the >>> exception of the RDAC kernel module, no additional software was >>> applied to the systems. We recreated our volumes and ran the >>> servers over the weekend. However, the OSS crashed about 8 hours >>> in. The syslog output is attached to this message. Looks like it >>> could be similar to bug #16404, which means patching and >>> rebuilding the kernel. Given my lack of success at trying to >>> build from source, I am again asking for some guidance on how to >>> do this. I sent out the steps I used to try and build from source >>> on the 7th because I was encountering problems and was unable to >>> get a working set of packages. Included in that messages was >>> output from quilt that implies that the kernel patching process >>> was not working properly. Regards, Malcolm. -- <6g_top.gif> >>> Malcolm Cowe Solutions Integration Engineer Sun Microsystems, >>> Inc. Blackness Road Linlithgow, West Lothian EH49 7LR UK Phone: >>> x73602 / +44 1506 673 602 Email: [EMAIL PROTECTED] >>> <6g_top.gif> Oct 10 06:49:39 oss-1 kernel: LDISKFS FS on md15, >>> internal journal Oct 10 06:49:39 oss-1 kernel: LDISKFS-fs: >>> mounted filesystem with ordered data mode. Oct 10 06:53:42 oss-1 >>> kernel: kjournald starting. Commit interval 5 seconds Oct 10 >>> 06:53:42 oss-1 kernel: LDISKFS FS on md16, internal journal Oct >>> 10 06:53:42 oss-1 kernel: LDISKFS-fs: mounted filesystem with >>> ordered data mode. Oct 10 06:57:49 oss-1 kernel: kjournald >>> starting. Commit interval 5 seconds Oct 10 06:57:49 oss-1 kernel: >>> LDISKFS FS on md17, internal journal Oct 10 06:57:49 oss-1 >>> kernel: LDISKFS-fs: mounted filesystem with ordered data mode. >>> Oct 10 07:44:55 oss-1 faultmond: 16:Polling all 48 slots for >>> drive fault Oct 10 07:45:00 oss-1 faultmond: Polling cycle 16 is >>> complete Oct 10 07:56:23 oss-1 kernel: Lustre: OBD class driver, >>> [EMAIL PROTECTED] Oct 10 07:56:23 oss-LDISKFS-fs: file extents >>> enabled1 kernel: Lustre VersionLDISKFS-fs: mballoc enabled : >>> 1.6.5.1 Oct 10 07:56:23 oss-1 kernel: Build Version: >>> 1.6.5.1-19691231190000-PRISTINE-.cache.OLDRPMS. >>> 20080618230526.linux- smp-2.6.9-67.0.7.EL_lustre. >>> 1.6.5.1.x86_64-2.6.9-67.0.7.EL_lustre. 1.6.5.1smp Oct 10 07:56:24 >>> oss-1 kernel: Lustre: Added LNI [EMAIL PROTECTED] [8/64] Oct 10 >>> 07:56:24 oss-1 kernel: Lustre: Lustre Client File System; >>> [EMAIL PROTECTED] Oct 10 07:56:24 oss-1 kernel: kjournald >>> starting. Commit interval 5 seconds Oct 10 07:56:24 oss-1 kernel: >>> LDISKFS FS on md11, external journal on md21 Oct 10 07:56:24 >>> oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data >>> mode. Oct 10 07:56:24 oss-1 kernel: kjournald starting. Commit >>> interval 5 seconds Oct 10 07:56:24 oss-1 kernel: LDISKFS FS on >>> md11, external journal on md21 Oct 10 07:56:24 oss-1 kernel: >>> LDISKFS-fs: mounted filesystem with journal data mode. Oct 10 >>> 07:56:24 oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10 >>> 07:56:24 oss-1 kernel: LDISKFS-fs: mballoc enabled Lustre: >>> Request x1 sent from [EMAIL PROTECTED] to NID >>> [EMAIL PROTECTED] 5s ago has timed out (limit 5s). Oct 10 >>> 07:56:30 oss-1 kernel: Lustre: Request x1 sent from >>> [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 5s ago has >>> timed out (limit 5s). LustreError: 4685:0:(events.c: >>> 55:request_out_callback()) @@@ type 4, status -113 >>> [EMAIL PROTECTED] x3/t0 o250- >>>> >>>> [EMAIL PROTECTED]@o2ib_1:26/25 lens 240/400 e 0 to 5 dl >>> 1223621815 ref 2 fl Rpc:/0/0 rc 0/0 Lustre: Request x3 sent from >>> [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 0s ago has >>> timed out (limit 5s). LustreError: 18125:0:(obd_mount.c: >>> 1062:server_start_targets()) Required registration failed for >>> lfs01-OSTffff: -5 LustreError: 15f-b: Communication error with >>> the MGS. Is the MGS running? LustreError: 18125:0:(obd_mount.c: >>> 1597:server_fill_super()) Unable to start targets: -5 >>> LustreError: 18125:0:(obd_mount.c:1382:server_put_super()) no obd >>> lfs01-OSTffff LustreError: 18125:0:(obd_mount.c: >>> 119:server_deregister_mount()) lfs01-OSTffff not registered >>> LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: >>> mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, >>> 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: >>> mballoc: 0 preallocated, 0 discarded Oct 10 07:56:50 oss-1 >>> kernel: Lustre: Changing connection for [EMAIL PROTECTED] to >>> MGC192.1Lustre: server umount lfs01- OSTffff complete >>> [EMAIL PROTECTED]: 18125:0:(obd_mount.c: >>> 1951:lustre_fill_super()) Unable to mount (-5) / >>> [EMAIL PROTECTED] Oct 10 07:56:50 oss-1 kernel: LustreError: >>> 4685:0:(events.c: 55:request_out_callback()) @@@ type 4, status >>> -113 [EMAIL PROTECTED] x3/t0 o250- >>> >[EMAIL PROTECTED]@o2ib_1:26/25 lens 240/400 e 0 to 5 dl >>> 1223621815 ref 2 fl Rpc:/0/0 rc 0/0Oct 10 07:56:50 oss-1 kernel: >>> Lustre: Request x3 sent from [EMAIL PROTECTED] to NID >>> [EMAIL PROTECTED] 0s ago has timed out (limit 5s). Oct 10 >>> 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: >>> 1062:server_start_targets()) Required registration failed for >>> lfs01- OSTffff: -5 Oct 10 07:56:50 oss-1 kernel: LustreError: 15f- >>> b: Communication error with the MGS. Is the MGS running? Oct 10 >>> 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: >>> 1597:server_fill_super()) Unable to start targets: -5 Oct 10 >>> 07:56:50 oss-1 kernel: LustreError: 18125:0:(obd_mount.c: >>> 1382:server_put_super()) no obd lfs01-OSTffff Oct 10 07:56:50 >>> oss-1 kernel: LustreError: 18125:0:(obd_mount.c: >>> 119:server_deregister_mount()) lfs01-OSTffff not registered Oct >>> 10 07:56:50 oss-1 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 >>> success) Oct 10 07:56:50 oss-1 kernel: LDISKFS-fs: mballoc: 0 >>> extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost Oct 10 >>> 07:56:51 oss-1 kernel: LDISKFS-fs: mballoc: 0 generated and it >>> took 0 Oct 10 07:56:51 oss-1 kernel: LDISKFS-fs: mballoc: 0 >>> preallocated, 0 discarded Oct 10 07:56:51 oss-1 kernel: Lustre: >>> server umount lfs01-OSTffff complete Oct 10 07:56:51 oss-1 >>> kernel: LustreError: 18125:0:(obd_mount.c: 1951:lustre_fill_super >>> ()) Unable to mount (-5) LustreError: 6644:0:(events.c: >>> 55:request_out_callback()) @@@ type 4, status -113 >>> [EMAIL PROTECTED] x1/t0 o250- >>>> >>>> [EMAIL PROTECTED]@o2ib_1:26/25 lens 240/400 e 0 to 5 dl >>> 1223621790 ref 1 fl Complete:EX/0/0 rc -110/0 Oct 10 07:57:15 >>> oss-1 kernel: LustreError: 6644:0:(events.c: >>> 55:request_out_callback()) @@@ type 4, status -113 >>> [EMAIL PROTECTED] x1/t0 o250- >>> >[EMAIL PROTECTED]@o2ib_1:26/25 lens 240/400 e 0 to 5 dl >>> 1223621790 ref 1 fl Complete:EX/0/0 rc -110/0 Oct 10 08:04:09 >>> oss-1 sshd(pam_unix)[18530]: session opened for user root by root >>> (uid=0) LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc >>> enabled Lustre: lfs01-OST0000: new disk, initializing Lustre: >>> Server lfs01-OST0000 on device /dev/md11 has started Oct 10 >>> 08:06:49 oss-1 kernel: kjournald starting. Commit interval 5 >>> seconds Oct 10 08:06:49 oss-1 kernel: LDISKFS FS on md11, >>> external journal on md21 Oct 10 08:06:49 oss-1 kernel: LDISKFS- >>> fs: mounted filesystem with journal data mode. Oct 10 08:06:49 >>> oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct >>> 10 08:06:49 oss-1 kernel: LDISKFS FS on md11, external journal on >>> md21 Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: mounted filesystem >>> with journal data mode. Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: >>> file extents enabled Oct 10 08:06:49 oss-1 kernel: LDISKFS-fs: >>> mballoc enabled Oct 10 08:06:49 oss-1 kernel: Lustre: Filtering >>> OBD driver; [EMAIL PROTECTED] Oct 10 08:06:49 oss-1 kernel: >>> Lustre: lfs01-OST0000: new disk, initializing Oct 10 08:06:49 >>> oss-1 kernel: Lustre: OST lfs01-OST0000 now serving dev (lfs01- >>> OST0000/ccc68ac6-5b58-acd6-455b-2df9d2980009) with recovery >>> enabled Oct 10 08:06:49 oss-1 kernel: Lustre: Server lfs01- >>> OST0000 on device /dev/md11 has started Lustre: lfs01-OST0000: >>> received MDS connection from [EMAIL PROTECTED] Oct 10 08:06:54 >>> oss-1 kernel: Lustre: lfs01-OST0000: received MDS connection from >>> [EMAIL PROTECTED] LDISKFS-fs: file extents enabled LDISKFS-fs: >>> mballoc enabled Lustre: lfs01-OST0001: new disk, initializing >>> Lustre: Server lfs01-OST0001 on device /dev/md12 has started Oct >>> 10 08:06:56 oss-1 kernel: kjournald starting. Commit interval 5 >>> seconds Oct 10 08:06:56 oss-1 kernel: LDISKFS FS on md12, >>> external journal on md22 Oct 10 08:06:56 oss-1 kernel: LDISKFS- >>> fs: mounted filesystem with journal data mode. Oct 10 08:06:56 >>> oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct >>> 10 08:06:56 oss-1 kernel: LDISKFS FS on md12, external journal on >>> md22 Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: mounted filesystem >>> with journal data mode. Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: >>> file extents enabled Oct 10 08:06:56 oss-1 kernel: LDISKFS-fs: >>> mballoc enabled Oct 10 08:06:56 oss-1 kernel: Lustre: lfs01- >>> OST0001: new disk, initializing Oct 10 08:06:56 oss-1 kernel: >>> Lustre: OST lfs01-OST0001 now serving dev (lfs01-OST0001/b2122e87- >>> be36-bd1a-4e40-fdd41e626d0b) with recovery enabled Oct 10 >>> 08:06:56 oss-1 kernel: Lustre: Server lfs01-OST0001 on device / >>> dev/md12 has started Lustre: lfs01-OST0001: received MDS >>> connection from [EMAIL PROTECTED] Oct 10 08:07:01 oss-1 kernel: >>> Lustre: lfs01-OST0001: received MDS connection from >>> [EMAIL PROTECTED] LDISKFS-fs: file extents enabled LDISKFS-fs: >>> mballoc enabled Lustre: lfs01-OST0002: new disk, initializing >>> Lustre: Server lfs01-OST0002 on device /dev/md13 has started Oct >>> 10 08:07:02 oss-1 kernel: kjournald starting. Commit interval 5 >>> seconds Oct 10 08:07:02 oss-1 kernel: LDISKFS FS on md13, >>> external journal on md23 Oct 10 08:07:02 oss-1 kernel: LDISKFS- >>> fs: mounted filesystem with journal data mode. Oct 10 08:07:02 >>> oss-1 kernel: kjournald starting. Commit interval 5 seconds Oct >>> 10 08:07:02 oss-1 kernel: LDISKFS FS on md13, external journal on >>> md23 Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: mounted filesystem >>> with journal data mode. Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: >>> file extents enabled Oct 10 08:07:02 oss-1 kernel: LDISKFS-fs: >>> mballoc enabled Oct 10 08:07:02 oss-1 kernel: Lustre: lfs01- >>> OST0002: new disk, initializing Oct 10 08:07:02 oss-1 kernel: >>> Lustre: OST lfs01-OST0002 now serving dev (lfs01- >>> OST0002/13c66dfa-47c5-b350-43e3-3c3b67c358b6) with recovery >>> enabled Oct 10 08:07:02 oss-1 kernel: Lustre: Server lfs01- >>> OST0002 on device /dev/md13 has started Lustre: lfs01-OST0002: >>> received MDS connection from [EMAIL PROTECTED] Oct 10 08:07:06 >>> oss-1 kernel: Lustre: lfs01-OST0002: received MDS connection from >>> [EMAIL PROTECTED] LDISKFS-fs: file extents enabled LDISKFS-fs: >>> mballoc enabled Oct 10 08:07:08 oss-1 kernel: kjournald starting. >>> Commit interval 5 seconds OcLustre: lfs01-OST0003: new disk, >>> initializing t 10 08:07:08 oss-1 kernel: LDISKFS FS on md15, >>> external journalLustre: Server lfs01-OST0003 on device /dev/md15 >>> has started on md25 Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: >>> mounted filesystem with journal data mode. Oct 10 08:07:08 oss-1 >>> kernel: kjournald starting. Commit interval 5 seconds Oct 10 >>> 08:07:08 oss-1 kernel: LDISKFS FS on md15, external journal on >>> md25 Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: mounted filesystem >>> with journal data mode. Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: >>> file extents enabled Oct 10 08:07:08 oss-1 kernel: LDISKFS-fs: >>> mballoc enabled Oct 10 08:07:08 oss-1 kernel: Lustre: lfs01- >>> OST0003: new disk, initializing Oct 10 08:07:08 oss-1 kernel: >>> Lustre: OST lfs01-OST0003 now serving dev (lfs01-OST0003/ >>> d6fd7a9d-3bb8-ae05-41ed-bbfb1b6b0303) with recovery enabled Oct >>> 10 08:07:08 oss-1 kernel: Lustre: Server lfs01-OST0003 on device / >>> dev/md15 has started Lustre: lfs01-OST0003: received MDS >>> connection from [EMAIL PROTECTED] Oct 10 08:07:12 oss-1 kernel: >>> Lustre: lfs01-OST0003: received MDS connection from >>> [EMAIL PROTECTED] LDISKFS-fs: file extents enabled LDISKFS-fs: >>> mballoc enabled Lustre: lfs01-OST0004: new disk, initializing Oct >>> 10 08:07:14 oss-1 kernel: kjournald starting. Commit >>> intervLustre: Server lfs01-OST0004 on device /dev/md16 has >>> started al 5 seconds Oct 10 08:07:14 oss-1 kernel: LDISKFS FS on >>> md16, external journal on md26 Oct 10 08:07:14 oss-1 kernel: >>> LDISKFS-fs: mounted filesystem with journal data mode. Oct 10 >>> 08:07:14 oss-1 kernel: kjournald starting. Commit interval 5 >>> seconds Oct 10 08:07:14 oss-1 kernel: LDISKFS FS on md16, >>> external journal on md26 Oct 10 08:07:14 oss-1 kernel: LDISKFS- >>> fs: mounted filesystem with journal data mode. Oct 10 08:07:14 >>> oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10 08:07:14 >>> oss-1 kernel: LDISKFS-fs: mballoc enabled Oct 10 08:07:14 oss-1 >>> kernel: Lustre: lfs01-OST0004: new disk, initializing Oct 10 >>> 08:07:14 oss-1 kernel: Lustre: OST lfs01-OST0004 now serving >>> dev (lfs01-OST0004/661dcb52-7ef9-8274-45d7-4441e36410d1) with >>> recovery enabled Oct 10 08:07:14 oss-1 kernel: Lustre: Server >>> lfs01-OST0004 on device /dev/md16 has started Lustre: lfs01- >>> OST0004: received MDS connection from [EMAIL PROTECTED] Oct 10 >>> 08:07:18 oss-1 kernel: Lustre: lfs01-OST0004: received MDS >>> connection from [EMAIL PROTECTED] LDISKFS-fs: file extents >>> enabled LDISKFS-fs: mballoc enabled Lustre: lfs01-OST0005: new >>> disk, initializing Lustre: Server lfs01-OST0005 on device /dev/ >>> md17 has started Oct 10 08:07:19 oss-1 kernel: kjournald >>> starting. Commit interval 5 seconds Oct 10 08:07:19 oss-1 kernel: >>> LDISKFS FS on md17, external journal on md27 Oct 10 08:07:19 >>> oss-1 kernel: LDISKFS-fs: mounted filesystem with journal data >>> mode. Oct 10 08:07:19 oss-1 kernel: kjournald starting. Commit >>> interval 5 seconds Oct 10 08:07:19 oss-1 kernel: LDISKFS FS on >>> md17, external journal on md27 Oct 10 08:07:19 oss-1 kernel: >>> LDISKFS-fs: mounted filesystem with journal data mode. Oct 10 >>> 08:07:19 oss-1 kernel: LDISKFS-fs: file extents enabled Oct 10 >>> 08:07:20 oss-1 kernel: LDISKFS-fs: mballoc enabled Oct 10 >>> 08:07:20 oss-1 kernel: Lustre: lfs01-OST0005: new disk, >>> initializing Oct 10 08:07:20 oss-1 kernel: Lustre: OST lfs01- >>> OST0005 now serving dev (lfs01- >>> OST0005/978ba68c-0ba7-9ac7-439f-964ca7bf86a3) with recovery >>> enabled Oct 10 08:07:20 oss-1 kernel: Lustre: Server lfs01- >>> OST0005 on device /dev/md17 has started Lustre: lfs01-OST0005: >>> received MDS connection from [EMAIL PROTECTED] Oct 10 08:07:25 >>> oss-1 kernel: Lustre: lfs01-OST0005: received MDS connection from >>> [EMAIL PROTECTED] Oct 10 08:45:00 oss-1 faultmond: 17:Polling >>> all 48 slots for drive fault Oct 10 08:45:06 oss-1 faultmond: >>> Polling cycle 17 is complete Oct 10 09:45:06 oss-1 faultmond: >>> 18:Polling all 48 slots for drive fault Oct 10 09:45:12 oss-1 >>> faultmond: Polling cycle 18 is complete Oct 10 10:45:12 oss-1 >>> faultmond: 19:Polling all 48 slots for drive fault Oct 10 >>> 10:45:17 oss-1 faultmond: Polling cycle 19 is complete >>> LustreError: 18732:0:(lustre_fsfilt.h:312:fsfilt_setattr()) >>> lfs01- OST0001: slow setattr 85s Oct 10 10:48:14 oss-1 kernel: >>> LustreError: 18732:0:(lustre_fsfilt.h: 312:fsfilt_setattr()) >>> lfs01-OST0001: slow setattr 85s Oct 10 11:45:17 oss-1 faultmond: >>> 20:Polling all 48 slots for drive fault Oct 10 11:45:25 oss-1 >>> faultmond: Polling cycle 20 is complete Oct 10 12:45:25 oss-1 >>> faultmond: 21:Polling all 48 slots for drive fault Oct 10 >>> 12:45:33 oss-1 faultmond: Polling cycle 21 is complete Lustre: >>> 18805:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0005: >>> slow setattr 33s Oct 10 13:14:46 oss-1 kernel: Lustre: 18805:0: >>> (lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-OST0005: slow >>> setattr 33s Lustre: 18794:0:(lustre_fsfilt.h:312:fsfilt_setattr >>> ()) lfs01- OST0000: slow setattr 43s Oct 10 13:15:03 oss-1 >>> kernel: Lustre: 18794:0:(lustre_fsfilt.h: 312:fsfilt_setattr()) >>> lfs01-OST0000: slow setattr 43s Lustre: 18815:0:(lustre_fsfilt.h: >>> 312:fsfilt_setattr()) lfs01- OST0004: slow setattr 40s Oct 10 >>> 13:15:13 oss-1 kernel: Lustre: 18815:0:(lustre_fsfilt.h: >>> 312:fsfilt_setattr()) lfs01-OST0004: slow setattr 40s Lustre: >>> 18809:0:(filter_io_26.c:700:filter_commitrw_write()) lfs01- >>> OST0003: slow i_mutex 31s Lustre: 18753:0:(filter_io_26.c: >>> 700:filter_commitrw_write()) lfs01- OST0003: slow i_mutex 31s Oct >>> 10 13:15:25 oss-1 kernel: Lustre: 18809:0:(filter_io_26.c: >>> 700:filter_commitrw_write()) lfs01-OST0003: slow i_mutex 31s Oct >>> 10 13:15:25 oss-1 kernel: Lustre: 18753:0:(filter_io_26.c: >>> 700:filter_commitrw_write()) lfs01-OST0003: slow i_mutex 31s >>> Lustre: 18768:0:(filter_io_26.c:700:filter_commitrw_write()) >>> lfs01- OST0002: slow i_mutex 34s Lustre: 18768:0:(filter_io_26.c: >>> 700:filter_commitrw_write()) Skipped 2 previous similar messages >>> Oct 10 13:15:28 oss-1 kernel: Lustre: 18768:0:(filter_io_26.c: >>> 700:filter_commitrw_write()) lfs01-OST0002: slow i_mutex 34s Oct >>> 10 13:15:28 oss-1 kernel: Lustre: 18768:0:(filter_io_26.c: >>> 700:filter_commitrw_write()) Skipped 2 previous similar messages >>> Lustre: 18833:0:(filter_io_26.c:700:filter_commitrw_write()) >>> lfs01- OST0001: slow i_mutex 37s Oct 10 13:15:31 oss-1 kernel: >>> Lustre: 18833:0:(filter_io_26.c: 700:filter_commitrw_write()) >>> lfs01-OST0001: slow i_mutex 37s Lustre: 18812:0:(filter_io_26.c: >>> 700:filter_commitrw_write()) lfs01- OST0002: slow i_mutex 40s >>> Lustre: 18844:0:(filter_io_26.c:765:filter_commitrw_write()) >>> lfs01- OST0003: slow direct_io 40s Oct 10 13:15:34 oss-1 kernel: >>> Lustre: 18812:0:(filter_io_26.c: 700:filter_commitrw_write()) >>> lfs01-OST0002: slow i_mutex 40s Oct 10 13:15:34 oss-1 kernel: >>> Lustre: 18844:0:(filter_io_26.c: 765:filter_commitrw_write()) >>> lfs01-OST0003: slow direct_io 40s Lustre: 18741:0: >>> (lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0001: slow >>> setattr 41s Lustre: 18849:0:(filter_io_26.c: >>> 765:filter_commitrw_write()) lfs01- OST0001: slow direct_io 31s >>> Oct 10 13:15:35 oss-1 kernel: Lustre: 18741:0:(lustre_fsfilt.h: >>> 312:fsfilt_setattr()) lfs01-OST0001: slow setattr 41s Oct 10 >>> 13:15:35 oss-1 kernel: Lustre: 18849:0:(filter_io_26.c: >>> 765:filter_commitrw_write()) lfs01-OST0001: slow direct_io 31s >>> LustreError: 18765:0:(lustre_fsfilt.h:312:fsfilt_setattr()) >>> lfs01- OST0002: slow setattr 51s Oct 10 13:15:38 oss-1 kernel: >>> LustreError: 18765:0:(lustre_fsfilt.h: 312:fsfilt_setattr()) >>> lfs01-OST0002: slow setattr 51s Lustre: 18756:0:(filter_io_26.c: >>> 700:filter_commitrw_write()) lfs01- OST0002: slow i_mutex 45s Oct >>> 10 13:15:39 oss-1 kernel: Lustre: 18756:0:(filter_io_26.c: >>> 700:filter_commitrw_write()) lfs01-OST0002: slow i_mutex 45s Oct >>> 10 13:45:33 oss-1 faultmond: 22:Polling all 48 slots for drive >>> fault Oct 10 13:45:41 oss-1 faultmond: Polling cycle 22 is >>> complete Oct 10 14:45:41 oss-1 faultmond: 23:Polling all 48 slots >>> for drive fault Oct 10 14:45:49 oss-1 faultmond: Polling cycle 23 >>> is complete Lustre: 18740:0:(lustre_fsfilt.h:312:fsfilt_setattr >>> ()) lfs01- OST0000: slow setattr 38s Oct 10 15:40:41 oss-1 >>> kernel: Lustre: 18740:0:(lustre_fsfilt.h: 312:fsfilt_setattr()) >>> lfs01-OST0000: slow setattr 38s LustreError: 18830:0: >>> (lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0004: slow >>> setattr 60s Lustre: 18767:0:(lustre_fsfilt.h:312:fsfilt_setattr >>> ()) lfs01- OST0005: slow setattr 38s Oct 10 15:41:13 oss-1 >>> kernel: LustreError: 18830:0:(lustre_fsfilt.h: 312:fsfilt_setattr >>> ()) lfs01-OST0004: slow setattr 60s Oct 10 15:41:13 oss-1 kernel: >>> Lustre: 18767:0:(lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01- >>> OST0005: slow setattr 38s Lustre: 18796:0:(lustre_fsfilt.h: >>> 312:fsfilt_setattr()) lfs01- OST0001: slow setattr 44s Oct 10 >>> 15:41:20 oss-1 kernel: Lustre: 18796:0:(lustre_fsfilt.h: >>> 312:fsfilt_setattr()) lfs01-OST0001: slow setattr 44s >>> LustreError: 18831:0:(lustre_fsfilt.h:312:fsfilt_setattr()) >>> lfs01- OST0002: slow setattr 62s Oct 10 15:41:21 oss-1 kernel: >>> LustreError: 18831:0:(lustre_fsfilt.h: 312:fsfilt_setattr()) >>> lfs01-OST0002: slow setattr 62s Oct 10 15:45:49 oss-1 faultmond: >>> 24:Polling all 48 slots for drive fault Oct 10 15:45:58 oss-1 >>> faultmond: Polling cycle 24 is complete Oct 10 16:45:58 oss-1 >>> faultmond: 25:Polling all 48 slots for drive fault Oct 10 >>> 16:46:06 oss-1 faultmond: Polling cycle 25 is complete Oct 10 >>> 17:46:06 oss-1 faultmond: 26:Polling all 48 slots for drive fault >>> Oct 10 17:46:15 oss-1 faultmond: Polling cycle 26 is complete >>> Lustre: 18741:0:(lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- >>> OST0000: slow setattr 41s Lustre: 18726:0:(service.c: >>> 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s >>> [EMAIL PROTECTED] x15789/t0 o13-><?>@<? >>>> >>>> :0/0 lens 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0 >>> Lustre: 18726:0:(service.c:918:ptlrpc_server_handle_req_in()) @@@ >>> Slow req_in handling 7s [EMAIL PROTECTED] x15790/t0 o13-><?>@<? >>>> >>>> :0/0 lens 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0 >>> Lustre: 18726:0:(service.c:918:ptlrpc_server_handle_req_in()) >>> Skipped 3 previous similar messages Lustre: 18764:0: >>> (lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0004: slow >>> setattr 40s Oct 10 18:06:33 oss-1 kernel: Lustre: 18741:0: >>> (lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-OST0000: slow >>> setattr 41s Oct 10 18:06:33 oss-1 kernel: Lustre: 18726:0: >>> (service.c: 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in >>> handling 7s [EMAIL PROTECTED] x15789/t0 o13-><?>@<?>:0/0 lens >>> 128/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0 Oct 10 18:06:33 >>> oss-1 kernel: Lustre: 18726:0:(service.c: >>> 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in handling 7s >>> [EMAIL PROTECTED] x15790/t0 o13-><?>@<?>:0/0 lens 128/0 e 0 to >>> 0 dl 0 ref 1 fl New:/0/0 rc 0/0 Lustre: 18845:0:(lustre_fsfilt.h: >>> 312:fsfilt_setattr()) lfs01- OST0002: slow setattr 44s Lustre: >>> 18579:0:(service.c:918:ptlrpc_server_handle_req_in()) @@@ Slow >>> req_in handling 14s [EMAIL PROTECTED] x7271650/t0 o103-><? >>>> >>>> @<?>:0/0 lens 232/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0 >>> Oct 10 18:06:54 oss-1 kernel: Lustre: 18726:0:(service.c: >>> 918:ptlrpc_server_handle_req_in()) Skipped 3 previous similar >>> messages Oct 10 18:06:54 oss-1 kernel: Lustre: 18764:0: >>> (lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-OST0004: slow >>> setattr 40s Oct 10 18:06:54 oss-1 kernel: Lustre: 18845:0: >>> (lustre_fsfilt.h: 312:fsfilt_setattr()) lfs01-OST0002: slow >>> setattr 44s Oct 10 18:06:54 oss-1 kernel: Lustre: 18579:0: >>> (service.c: 918:ptlrpc_server_handle_req_in()) @@@ Slow req_in >>> handling 14s [EMAIL PROTECTED] x7271650/t0 o103-><?>@<?>:0/0 >>> lens 232/0 e 0 to 0 dl 0 ref 1 fl New:/0/0 rc 0/0 Lustre: 18766:0: >>> (lustre_fsfilt.h:312:fsfilt_setattr()) lfs01- OST0005: slow >>> setattr 32s Lustre: 18766:0:(lustre_fsfilt.h:312:fsfilt_setattr >>> ()) Skipped 1 previous similar message Oct 10 18:06:59 oss-1 >>> kernel: Lustre: 18766:0:(lustre_fsfilt.h: 312:fsfilt_setattr()) >>> lfs01-OST0005: slow setattr 32s Oct 10 18:06:59 oss-1 kernel: >>> Lustre: 18766:0:(lustre_fsfilt.h: 312:fsfilt_setattr()) Skipped 1 >>> previous similar message Lustre: 18826:0:(lustre_fsfilt.h: >>> 312:fsfilt_setattr()) lfs01- OST0003: slow setattr 45s Oct 10 >>> 18:07:04 oss-1 kernel: Lustre: 18826:0:(lustre_fsfilt.h: >>> 312:fsfilt_setattr()) lfs01-OST0003: slow setattr 45s Oct 10 >>> 18:46:15 oss-1 faultmond: 27:Polling all 48 slots for drive fault >>> ----------- [cut here ] --------- [please bite here ] --------- >>> Kernel BUG at spinlock:76 invalid operand: 0000 [1] SMP CPU 2 >>> Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) >>> lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U) >>> obdclass(U) lvfs(U) ldiskfs(U) lnet(U) libcfs(U) raid5(U) xor(U) >>> parport_pc(U) lp(U) parport(U) autofs4(U) i2c_dev(U) i2c_core(U) >>> ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) sunrpc(U) rdma_ucm >>> (U) qlgc_vnic(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib >>> (U) md5(U) ipv6(U) iw_cxgb3(U) cxgb3(U) ib_ipath(U) mlx4_ib(U) >>> mlx4_core (U) ds(U) yenta_socket(U) pcmcia_core(U) dm_mirror(U) >>> dm_multipath (U) dm_mod(U) button(U) battery(U) ac(U) joydev(U) >>> ohci_hcd(U) ehci_hcd(U) hw_random(U) edac_mc(U) ib_mthca(U) >>> ib_umad(U) ib_ucm (U) ib_uverbs(U) ib_cm(U) ib_sa(U) ib_mad(U) >>> ib_core(U) e1000(U) ext3(U) jbd(U) raid1(U) mv_sata(U) sd_mod(U) >>> scsi_mod(U) _______________________________________________ >>> Lustre-discuss mailing list Lustre-discuss@lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> _______________________________________________ Lustre-discuss >> mailing list Lustre-discuss@lists.lustre.org http:// >> lists.lustre.org/mailman/listinfo/lustre-discuss > > -- > <6g_top.gif> > Malcolm Cowe > Solutions Integration Engineer > > Sun Microsystems, Inc. > Blackness Road > Linlithgow, West Lothian EH49 7LR UK > Phone: x73602 / +44 1506 673 602 > Email: [EMAIL PROTECTED] _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss