here is logs with time stamp Primary OSS
Jan 31 16:03:08 typhoon kernel: Lustre: 1689:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 32s Jan 31 16:03:08 typhoon kernel: Lustre: 1689:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw commit 32s Jan 31 16:03:12 typhoon kernel: Lustre: 1712:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 36s Jan 31 16:03:12 typhoon kernel: Lustre: 1712:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw commit 36s Jan 31 16:03:22 typhoon kernel: Lustre: 1695:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 46s Jan 31 16:03:22 typhoon kernel: Lustre: 1695:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw commit 46s Jan 31 16:19:19 typhoon kernel: Lustre: 1677:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 30s Jan 31 16:19:19 typhoon kernel: Lustre: 1677:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw commit 30s Jan 31 16:28:54 typhoon kernel: Lustre: 1712:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 46s Jan 31 16:28:54 typhoon kernel: Lustre: 1712:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw commit 46s Jan 31 16:32:51 typhoon kernel: Lustre: 1470:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.37@tcp portal 16 match 10971425 offset 0 length 64: 2 Jan 31 16:32:56 typhoon kernel: Lustre: 1470:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.37@tcp portal 16 match 10971427 offset 0 length 64: 2 Jan 31 16:34:23 typhoon kernel: Lustre: 1707:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw commit 31s Jan 31 16:34:29 typhoon kernel: Lustre: 1692:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw commit 31s Jan 31 16:34:29 typhoon kernel: Lustre: 1692:0:(filter_io_26.c:727:filter_commitrw_write()) Skipped 1 previous similar message Jan 31 16:50:19 typhoon kernel: Lustre: 1674:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 34s Jan 31 16:50:19 typhoon kernel: Lustre: 1674:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw commit 34s Jan 31 16:50:19 typhoon kernel: Lustre: 1674:0:(filter_io_26.c:727:filter_commitrw_write()) Skipped 5 previous similar messages Jan 31 16:58:38 typhoon kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 1731: it was inactive for 100000ms Jan 31 16:58:38 typhoon kernel: Lustre: 0:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process 1731 Jan 31 16:58:38 typhoon kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 1683: it was inactive for 100000ms Jan 31 16:58:38 typhoon kernel: Lustre: 0:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process 1683 Jan 31 16:59:04 typhoon kernel: Lustre: 1731:0:(watchdog.c:311:lcw_update_time()) Expired watchdog for pid 1731 disabled after 125597293.0001s Jan 31 16:59:04 typhoon kernel: Lustre: 1683:0:(watchdog.c:311:lcw_update_time()) Expired watchdog for pid 1683 disabled after 125576812.0001s Jan 31 17:04:39 typhoon kernel: Lustre: 1701:0:(filter_io_26.c:714:filter_commitrw_write()) ost1: slow direct_io 36s Jan 31 17:04:39 typhoon kernel: Lustre: 1701:0:(filter_io_26.c:727:filter_commitrw_write()) ost1: slow commitrw commit 36s secondat OSS Jan 31 16:19:08 cyclops kernel: Lustre: 1703:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 34s Jan 31 16:19:08 cyclops kernel: Lustre: 1703:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw commit 34s Jan 31 16:19:43 cyclops kernel: Lustre: 1700:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 30s Jan 31 16:19:43 cyclops kernel: Lustre: 1700:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw commit 30s Jan 31 16:24:51 cyclops kernel: Lustre: 1706:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 30s Jan 31 16:24:51 cyclops kernel: Lustre: 1706:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw commit 30s Jan 31 16:27:11 cyclops kernel: Lustre: 1691:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 30s Jan 31 16:27:11 cyclops kernel: Lustre: 1691:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw commit 30s Jan 31 16:29:35 cyclops kernel: Lustre: 1728:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 40s Jan 31 16:29:35 cyclops kernel: Lustre: 1728:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw commit 40s Jan 31 16:32:40 cyclops kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog triggered for pid 1667: it was inactive for 100000ms Jan 31 16:32:40 cyclops kernel: Lustre: 0:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process 1667 Jan 31 16:32:44 cyclops kernel: Lustre: 1667:0:(watchdog.c:311:lcw_update_time()) Expired watchdog for pid 1667 disabled after 103346449.0001s Jan 31 16:32:44 cyclops kernel: Lustre: 1667:0:(watchdog.c:311:lcw_update_time()) Skipped 1 previous similar message Jan 31 16:33:05 cyclops kernel: Lustre: 1615:0:(ldlm_lib.c:486:target_handle_reconnect()) ost2: 7c2f0_lov1_34edb7f8f7 reconnecting Jan 31 16:33:42 cyclops kernel: Lustre: 1718:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 36s Jan 31 16:33:42 cyclops kernel: Lustre: 1718:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw commit 36s Jan 31 16:58:42 cyclops kernel: Lustre: 1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.37@tcp portal 16 match 10726907 offset 0 length 112: 2 Jan 31 16:58:47 cyclops kernel: Lustre: 1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.37@tcp portal 16 match 10726911 offset 0 length 64: 2 Jan 31 16:58:47 cyclops kernel: Lustre: 1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.37@tcp portal 16 match 10726921 offset 0 length 64: 2 Jan 31 16:58:47 cyclops kernel: Lustre: 1465:0:(lib-move.c:1644:lnet_parse_put()) Skipped 1 previous similar message Jan 31 16:58:47 cyclops kernel: Lustre: 1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.37@tcp portal 16 match 10726923 offset 0 length 64: 2 Jan 31 16:58:47 cyclops kernel: Lustre: 1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.37@tcp portal 16 match 10726926 offset 0 length 64: 2 Jan 31 16:58:47 cyclops kernel: Lustre: 1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.37@tcp portal 16 match 10726929 offset 0 length 64: 2 Jan 31 16:58:57 cyclops kernel: Lustre: 1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.37@tcp portal 16 match 10726969 offset 0 length 64: 2 Jan 31 16:59:02 cyclops kernel: Lustre: 1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.37@tcp portal 16 match 10726992 offset 0 length 64: 2 Jan 31 16:59:02 cyclops kernel: Lustre: 1465:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.37@tcp portal 16 match 10726994 offset 0 length 64: 2 Client Jan 31 16:32:40 hades kernel: Lustre: Changing connection for OSC_hades.example.com_ost2_MNT_client to typhoon_UUID/10.65.200.21@tcp Jan 31 16:32:44 hades kernel: Lustre: 2118:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.30@tcp portal 4 match 57154751 offset 0 length 288: 2 Jan 31 16:32:44 hades kernel: Lustre: 2118:0:(lib-move.c:1644:lnet_parse_put()) Skipped 1 previous similar message Jan 31 16:33:05 hades kernel: Lustre: Changing connection for OSC_hades.example.com_ost2_MNT_client to cyclops_UUID/10.65.200.30@tcp Jan 31 16:33:05 hades kernel: Lustre: OSC_hades.example.com_ost2_MNT_client: Connection restored to service ost2 using nid 10.65.200.30@tcp. Jan 31 16:33:12 hades kernel: Lustre: Changing connection for OSC_hades.example.com_ost2_MNT_client to typhoon_UUID/10.65.200.21@tcp Jan 31 16:33:55 hades kernel: Lustre: Changing connection for OSC_hades.example.com_ost2_MNT_client to cyclops_UUID/10.65.200.30@tcp Jan 31 16:33:58 hades kernel: Lustre: OSC_hades.example.com_ost2_MNT_client: Connection restored to service ost2 using nid 10.65.200.30@tcp. Jan 31 16:58:38 hades kernel: Lustre: Changing connection for OSC_hades.example.com_ost1_MNT_client to cyclops_UUID/10.65.200.30@tcp Jan 31 16:59:03 hades kernel: Lustre: Changing connection for OSC_hades.example.com_ost1_MNT_client to typhoon_UUID/10.65.200.21@tcp Jan 31 16:59:05 hades kernel: Lustre: 2121:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.21@tcp portal 4 match 57400851 offset 0 length 280: 2 Jan 31 16:59:05 hades kernel: Lustre: 2121:0:(lib-move.c:1644:lnet_parse_put()) Dropping PUT from 12345-10.65.200.21@tcp portal 4 match 57396169 offset 0 length 280: 2 Jan 31 16:59:06 hades kernel: Lustre: OSC_hades.example.com_ost1_MNT_client: Connection restored to service ost1 using nid 10.65.200.21@tcp. Jan 31 18:43:31 hades kernel: Lustre: Acceptor stopping Jan 31 18:43:33 hades kernel: Lustre: Removed LNI 10.65.200.37@tcp Jan 31 18:44:37 hades kernel: Lustre: 1145:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 Jan 31 18:44:38 hades kernel: Lustre: OBD class driver Build Version: 1.4.9.1-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_9_1-2.6-rhel4-i686.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.8.EL_lustre.1.4.9.1smp, [email protected] Jan 31 18:44:38 hades kernel: Lustre: Added LNI 10.65.200.37@tcp [8/256] Jan 31 18:44:38 hades kernel: Lustre: Accept secure, port 988 Jan 31 18:44:38 hades kernel: Lustre: Lustre Lite Client File System; [email protected] Jan 31 18:54:58 hades kernel: Lustre: Acceptor stopping Jan 31 18:55:00 hades kernel: Lustre: Removed LNI 10.65.200.37@tcp Jan 31 19:00:25 hades kernel: Lustre: 2680:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 Jan 31 19:00:25 hades kernel: Lustre: OBD class driver Build Version: 1.4.9.1-19691231170000-PRISTINE-.testsuite.tmp.lbuild-boulder.lbuild-v1_4_9_1-2.6-rhel4-i686.lbuild.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-42.0.8.EL_lustre.1.4.9.1smp, [email protected] Jan 31 19:00:25 hades kernel: Lustre: Added LNI 10.65.200.37@tcp [8/256] Jan 31 19:00:25 hades kernel: Lustre: Accept secure, port 988 Jan 31 19:00:25 hades kernel: Lustre: Lustre Lite Client File System; [email protected] Plz suggest now. On Mon, Jan 31, 2011 at 11:55 PM, Brian J. Murrell <[email protected]>wrote: > On Mon, 2011-01-31 at 16:22 +0500, Nauman Yousuf wrote: > > hi > > Hi, > > > what thiserror means > > Same as I told you last time. (a) it's difficult to gather context > without timestamps and (b) lacking any further information it looks like > your storage target is overloaded. > > Again, I would posit that this is likely because your storage target is > rebuilding and you are expecting the same kind of performance in that > degraded state that you get when the mirror is a non-degraded state. > > You need to tune your cluster to operate within the performance > parameters of degraded storage targets. The manual covers this kind of > tuning. > > b. > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > -- Regards Nauman Yousuf 0321-2549206 E-Eager, N-Noble, G-Genuine, I-Intelligent, N-Natural, E-Enthusiastic, E-Energetic, R-Resourcefull --- ENGINEER
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
