An updater regarding this: UPDATE: I have noticed that on this OST (wurfs-OST001b) The IO Scrub gets launched every ~7 seconds: [root@storage06 wurfs-OST001b]# cat oi_scrub name: OI_scrub magic: 0x4c5fd252 oi_files: 64 status: completed flags: param: time_since_last_completed: 8 seconds time_since_latest_start: 8 seconds time_since_last_checkpoint: 8 seconds latest_start_position: 12 last_checkpoint_position: 30515713 first_failure_position: N/A checked: 3417 updated: 0 failed: 0 prior_updated: 0 noscrub: 0 igif: 1 success_count: 2526979 run_time: 0 seconds average_speed: 3417 objects/sec real-time_speed: N/A current_position: N/A lf_scanned: 0 lf_repaired: 0 lf_failed: 0 [root@storage06 wurfs-OST001b]# cat oi_scrub name: OI_scrub magic: 0x4c5fd252 oi_files: 64 status: completed flags: param: time_since_last_completed: 2 seconds time_since_latest_start: 2 seconds time_since_last_checkpoint: 2 seconds latest_start_position: 12 last_checkpoint_position: 30515713 first_failure_position: N/A checked: 3417 updated: 0 failed: 0 prior_updated: 0 noscrub: 0 igif: 1 success_count: 2526980 run_time: 0 seconds average_speed: 3417 objects/sec real-time_speed: N/A current_position: N/A lf_scanned: 0 lf_repaired: 0 lf_failed: 0
And, dumping the logs from the ring buffer i see: 00080000:02000400:24.0:1489665812.888068:0:35949:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1] 00002000:00020000:24.0:1489665812.888083:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115 00100000:10000000:27.0:1489665812.923388:0:40057:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1 00100000:10000000:27.0:1489665812.923400:0:40057:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1 00002000:00080000:24.0:1489665822.903706:0:35949:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346 00100000:10000000:27.0:1489665822.903984:0:40212:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e 00100000:10000000:27.0:1489665822.903992:0:40212:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0 00100000:10000000:27.0:1489665822.904016:0:40212:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12 00080000:02000400:24.0:1489665822.904062:0:35949:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1] 00002000:00020000:24.0:1489665822.904079:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115 00100000:10000000:27.0:1489665822.940373:0:40212:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1 00100000:10000000:27.0:1489665822.940385:0:40212:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1 00002000:00080000:8.0:1489665832.919771:0:10464:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346 00100000:10000000:20.0:1489665832.920031:0:40406:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e 00100000:10000000:20.0:1489665832.920037:0:40406:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0 00100000:10000000:20.0:1489665832.920057:0:40406:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12 00080000:02000400:8.0:1489665832.920094:0:10464:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1] 00002000:00020000:8.0:1489665832.920113:0:10464:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115 00100000:10000000:20.0:1489665832.955088:0:40406:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1 00100000:10000000:20.0:1489665832.955101:0:40406:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1 00002000:00080000:30.0:1489665842.935720:0:35960:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346 00100000:10000000:27.0:1489665842.936008:0:40553:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e 00100000:10000000:27.0:1489665842.936015:0:40553:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0 00100000:10000000:27.0:1489665842.936038:0:40553:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12 00080000:02000400:30.0:1489665842.936081:0:35960:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1] 00002000:00020000:30.0:1489665842.936096:0:35960:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115 00100000:10000000:27.0:1489665842.972129:0:40553:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1 00100000:10000000:27.0:1489665842.972141:0:40553:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1 00002000:00080000:10.0:1489665852.951770:0:35949:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346 00100000:10000000:18.0:1489665852.951986:0:40838:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e 00100000:10000000:18.0:1489665852.951992:0:40838:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0 00100000:10000000:18.0:1489665852.952017:0:40838:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12 00080000:02000400:10.0:1489665852.952060:0:35949:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1] 00002000:00020000:10.0:1489665852.952089:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115 00100000:10000000:18.0:1489665852.987792:0:40838:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1 00100000:10000000:18.0:1489665852.987804:0:40838:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1 00002000:00080000:8.0:1489665862.967664:0:35949:0:(ofd_dev.c:1747:ofd_create_hdl()) wurfs-OST001b: reserve 64 objects in group 0x0 at 26893346 00100000:10000000:27.0:1489665862.967948:0:41207:0:(osd_scrub.c:660:osd_scrub_prep()) wurfs-OST001b: OI scrub prep, flags = 0x4e 00100000:10000000:27.0:1489665862.967955:0:41207:0:(osd_scrub.c:278:osd_scrub_file_reset()) wurfs-OST001b: reset OI scrub file, old flags = 0x0, add flags = 0x0 00100000:10000000:27.0:1489665862.967982:0:41207:0:(osd_scrub.c:1510:osd_scrub_main()) wurfs-OST001b: OI scrub start, flags = 0x4e, pos = 12 00080000:02000400:8.0:1489665862.968024:0:35949:0:(osd_handler.c:860:osd_fid_lookup()) wurfs-OST001b-os: trigger OI scrub by RPC for [0x1001b0000:0x19a5c22:0x0], rc = 0 [1] 00002000:00020000:8.0:1489665862.968040:0:35949:0:(ofd_dev.c:1781:ofd_create_hdl()) wurfs-OST001b: unable to precreate: rc = -115 00100000:10000000:27.0:1489665863.004087:0:41207:0:(osd_scrub.c:758:osd_scrub_post()) wurfs-OST001b: OI scrub post, result = 1 00100000:10000000:27.0:1489665863.004098:0:41207:0:(osd_scrub.c:1520:osd_scrub_main()) wurfs-OST001b: OI scrub: stop, pos = 30515713: rc = 1 I tried to see where that FID leads but seems that the file doesnt actually exist; (The customer has moved everything away from this osts) [root@nfs01 ~]# lfs fid2path wurfs "[0x1001b0000:0x19a5c22:0x0]" ioctl err -22: Invalid argument (22) fid2path: error on FID [0x1001b0000:0x19a5c22:0x0]: Invalid argument Not sure how to proceed form here On 16 March 2017 at 11:03, Andrea del Monaco < [email protected]> wrote: > Dear all, > > We are facing an issue with one OST. > We have stopped pacemaker on the storage06 (which is the one that has that > resource running): > [root@storage06 log]# pcs status | grep 1b > storage-ost001b (ocf::heartbeat:Filesystem): Started > storage06.failover.cluster > storage-ost001b_monitor_120000 on storage06.failover.cluster 'not running' > (7): call=295, status=complete, exitreason='none' > * > And then we have tried to execute e2fsck -n /dev/mapper/ost001b. > The e2fsck has reported nothing to be repaired. > Today, i noticed that there are still errors and we can't create files on > this OST: > [Mon Mar 13 18:36:44 2017] LustreError: > 42126:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > [Mon Mar 13 18:46:44 2017] LustreError: > 35949:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > [Mon Mar 13 18:56:44 2017] LustreError: > 26996:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > [Mon Mar 13 19:06:45 2017] LustreError: > 26989:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > [Tue Mar 14 03:37:13 2017] LustreError: > 26995:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > [Tue Mar 14 03:47:13 2017] LustreError: > 44782:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > [Tue Mar 14 03:57:14 2017] LustreError: > 35964:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > [Tue Mar 14 04:07:14 2017] LustreError: > 35964:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > [Tue Mar 14 04:17:14 2017] LustreError: > 26994:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > [Tue Mar 14 04:27:15 2017] LustreError: > 27006:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > [Tue Mar 14 04:37:15 2017] LustreError: > 27006:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > [Tue Mar 14 04:47:15 2017] LustreError: > 35964:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > [Tue Mar 14 07:07:30 2017] LustreError: > 35960:0:(ofd_dev.c:1781:ofd_create_hdl()) > wurfs-OST001b: unable to precreate: rc = -115 > Llooking at cat /usr/include/asm-generic/errno.h, seems that error refers > to: > #define EINPROGRESS 115 /* Operation now in progress */ > #define ESTALE 116 /* Stale file handle */ > (on some other osts we do have error 116 as well) > > Any idea about what to do next? > > I will increase the verbose and dump the logs from the ring buffer. > > Kind regards, > -- > > [image: clustervision_logo.png] > Andrea Del Monaco > Internal Engineer > > > > Skype: delmonaco.andrea > [email protected] > > ClusterVision BV > Gyroscoopweg 56 > 1042 AC Amsterdam > The Netherlands > Tel: +31 20 407 7550 <+31%2020%20407%207550> > Fax: +31 84 759 8389 <+31%2084%20759%208389> > www.clustervision.com > > -- [image: clustervision_logo.png] Andrea Del Monaco Internal Engineer Skype: delmonaco.andrea [email protected] ClusterVision BV Gyroscoopweg 56 1042 AC Amsterdam The Netherlands Tel: +31 20 407 7550 Fax: +31 84 759 8389 www.clustervision.com
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
