Hello Andreas, Have you gotten the opportunity to have a look in the logs that I sent you?
Thanks in advance. Best Regards, Sidiney On Thu, 3 May 2018 at 12:34, Sidiney Crescencio < [email protected]> wrote: > Hello Andreas, > > Thanks for you answer. > > [root@storage06 ~]# debugfs -c -R "stat O/0/d$((0x1bfc24c % > 32))/$((0x1bfc24c))" /dev/mapper/ost001c | grep -i fid > debugfs 1.42.13.wc6 (05-Feb-2017) > /dev/mapper/ost001c: catastrophic mode - not reading inode or group bitmaps > lma: fid=[0x100000000:0x1bfc2c7:0x0] compat=8 incompat=0 > fid = "18 93 02 00 0b 00 00 00 3c c2 01 00 00 00 00 00 " (16) > fid: parent=[0xb00029318:0x1c23c:0x0] stripe=0 > > > [root@node024 ~]# lfs fid2path /lustre/ 0x100000000:0x1bfc2c7:0x0 > ioctl err -22: Invalid argument (22) > fid2path: error on FID 0x100000000:0x1bfc2c7:0x0: Invalid argument > > [root@node024 ~]# lfs fid2path /lustre/ 0xb00029318:0x1c23c:0x0 > fid2path: error on FID 0xb00029318:0x1c23c:0x0: No such file or directory > > Am I doing right? I think so, actually looks like the file is already gone > as I tought in the first moment.. > > About the hang thread , I've filtered like this and couldn't find nothing > that might indicate the issue, what else we can check for solve this error? > > > > [root@storage06 ~]# cat /var/log/messages* | grep -i OST001c | grep -v > destroying | grep -v scrub > Apr 30 11:01:13 storage06 kernel: Lustre: wurfs-OST001c: Connection > restored to e9153718-f82d-d90b-268a-e8c9a5e3af1c (at 192.168.2.19@o2ib) > May 2 15:54:17 storage06 kernel: Lustre: wurfs-OST001c: haven't heard > from client 9c4b82f6-a2a7-3488-c2b3-cabb9cf333e5 (at 192.168.2.25@o2ib) > in 1352 seconds. I think it's dead, and I am evicting it. exp > ffff8804c451e000, cur 1525269257 expire 1525268357 last 1525267905 > Apr 5 10:11:43 storage06 kernel: Lustre: wurfs-OST001c: haven't heard > from client c1966b99-1299-9da0-3280-bd6ad84f8f27 (at 192.168.2.51@o2ib) > in 1352 seconds. I think it's dead, and I am evicting it. exp > ffff8804c4519800, cur 1522915903 expire 1522915003 last 1522914551 > Apr 5 10:44:20 storage06 kernel: Lustre: wurfs-OST001c: Connection > restored to 7fbdaa81-10cb-2464-f981-883bee1f6fdf (at 192.168.2.21@o2ib) > Apr 5 10:59:52 storage06 kernel: Lustre: wurfs-OST001c: Connection > restored to aef29b00-0042-9f5e-da17-3bd3b655e13d (at 192.168.2.2@o2ib) > Apr 5 11:09:59 storage06 kernel: Lustre: wurfs-OST001c: haven't heard > from client c4cec4f1-b994-2ad2-be36-196b9f5c1b76 (at 192.168.2.161@o2ib) > in 1352 seconds. I think it's dead, and I am evicting it. exp > ffff88059a0a2400, cur 1522919399 expire 1522918499 last 1522918047 > Apr 14 14:58:02 storage06 kernel: LustreError: > 0:0:(ldlm_lockd.c:342:waiting_locks_callback()) ### lock callback timer > expired after 377s: evicting client at 192.168.2.33@o2ib ns: > filter-wurfs-OST001c_UUID lock: ffff880bbf72dc00/0xb64a498f40bc086 lrc: > 4/0,0 mode: PR/PR res: [0x38ee37e:0x0:0x0].0x0 rrc: 2 type: EXT > [0->18446744073709551615] (req 0->18446744073709551615) flags: > 0x60000400010020 nid: 192.168.2.33@o2ib remote: 0x73aa9e5b8c684dc5 > expref: 328 pid: 39172 timeout: 16574376013 lvb_type: 1 > Apr 14 15:05:56 storage06 kernel: Lustre: wurfs-OST001c: Client > wurfs-MDT0000-mdtlov_UUID (at 192.168.2.182@o2ib) reconnecting > Apr 14 15:05:56 storage06 kernel: Lustre: wurfs-OST001c: Connection > restored to 192.168.2.182@o2ib (at 192.168.2.182@o2ib) > Apr 14 15:05:56 storage06 kernel: Lustre: wurfs-OST001c: deleting orphan > objects from 0x0:59696086 to 0x0:59705564 > Apr 15 15:38:28 storage06 kernel: Lustre: wurfs-OST001c: haven't heard > from client a21e3dcc-af43-1dc2-b552-ca341a6b5e77 (at 192.168.2.5@o2ib) in > 1352 seconds. I think it's dead, and I am evicting it. exp > ffff880629717000, cur 1523799508 expire 1523798608 last 1523798156 > Apr 15 16:01:07 storage06 kernel: Lustre: wurfs-OST001c: haven't heard > from client c931b18c-e0cf-4a0c-d95f-9a8cf60f3b3f (at 192.168.2.36@o2ib) > in 1352 seconds. I think it's dead, and I am evicting it. exp > ffff880d3fcfdc00, cur 1523800867 expire 1523799967 last 1523799515 > Apr 15 18:45:35 storage06 kernel: Lustre: wurfs-OST001c: haven't heard > from client af5f8ac5-fb5d-cd1c-cf97-b755700778bc (at 192.168.2.9@o2ib) in > 1352 seconds. I think it's dead, and I am evicting it. exp > ffff8807fed8d000, cur 1523810735 expire 1523809835 last 1523809383 > Apr 16 09:04:27 storage06 kernel: Lustre: > 39169:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has > failed due to network error: [sent 1523862169/real 1523862267] > req@ffff8809e5746300 x1584854319120496/t0(0) > o104->[email protected]@o2ib:15/16 lens 296/224 e 0 to 1 dl > 1523862736 ref 1 fl Rpc:X/2/ffffffff rc 0/-1 > Apr 16 09:44:18 storage06 kernel: Lustre: wurfs-OST001c: Connection > restored to 2e95bceb-837d-5518-9198-48dd0b2b9a83 (at 192.168.2.40@o2ib) > Apr 16 09:53:26 storage06 kernel: Lustre: wurfs-OST001c: Connection > restored to 07eac249-8012-fe49-1037-3920d06e1403 (at 192.168.2.38@o2ib) > Apr 16 09:55:48 storage06 kernel: Lustre: wurfs-OST001c: Connection > restored to 3df9306f-8024-c85f-8d42-3ad863a3f4c0 (at 192.168.2.171@o2ib) > Apr 16 10:11:25 storage06 kernel: Lustre: wurfs-OST001c: Connection > restored to d9a56a18-c51e-2b0c-561d-3b0fa31ca8f7 (at 192.168.2.12@o2ib) > Apr 16 10:12:06 storage06 kernel: Lustre: wurfs-OST001c: Connection > restored to c00bd597-31b4-ded9-fd06-d02500010dad (at 192.168.2.172@o2ib) > Apr 16 13:50:44 storage06 kernel: Lustre: wurfs-OST001c: haven't heard > from client 4d69154c-ca88-ce45-23f7-ff76f1a6423f (at 192.168.2.14@o2ib) > in 1352 seconds. I think it's dead, and I am evicting it. exp > ffff8804c4678800, cur 1523879444 expire 1523878544 last 1523878092 > > Many thanks. > > > On 2 May 2018 at 20:16, Dilger, Andreas <[email protected]> wrote: > >> This is an OST FID, so you would need to get the parent MDT FID to be >> able to resolve the pathname. >> >> Assuming an ldiskfs OST you can use: >> >> 'debugfs -c -R "stat O/0/d$((0x1bfc24c % 32))/$((0x1bfc24c))" >> LABEL=wurfs-OST001c' >> >> To get the parent FID, then "lfs fid2path /mnt/wurfs <FID>" on a client >> to find the path. >> >> That said, the -115 error is "-EINPROGRESS", which means the OST thinks >> it is already trying to do this. Maybe a hung OST thread? >> >> Cheers, Andreas >> >> On May 2, 2018, at 06:53, Sidiney Crescencio < >> [email protected]> wrote: >> >> Hi All, >> >> I need help to discover what file is about this error or how to solve it. >> >> Apr 30 13:48:02 storage06 kernel: LustreError: >> 44779:0:(ofd_dev.c:1884:ofd_destroy_hdl()) wurfs-OST001c: error destroying >> object [0x1001c0000:0x1bfc24c:0x0]: -115 >> >> I've been trying to map this to a file but I can't since I don't have the >> FID >> >> Anyone knows how to sort it out? >> >> Thanks in advance >> >> -- >> Best Regards, >> >> >> >> Sidiney >> >> >> _______________________________________________ >> lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org >> >> > > > -- > Best Regards, > > [image: clustervision_logo.png] > Sidiney Crescencio > Technical Support Engineer > > > Direct: +31 20 407 7550 > Skype: sidiney.crescencio_1 > [email protected] > > ClusterVision BV > Gyroscoopweg 56 > 1042 AC Amsterdam > The Netherlands > Tel: +31 20 407 7550 > Fax: +31 84 759 8389 > www.clustervision.com > > > > > > -- Best Regards, [image: clustervision_logo.png] Sidiney Crescencio Technical Support Engineer Direct: +31 20 407 7550 Skype: sidiney.crescencio_1 [email protected] ClusterVision BV Gyroscoopweg 56 1042 AC Amsterdam The Netherlands Tel: +31 20 407 7550 Fax: +31 84 759 8389 www.clustervision.com
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
