Re: [lustre-discuss] Error in lfsck: "NOT IMPLEMETED YET"

João Carlos Mendes Luís Tue, 23 Jul 2019 12:08:55 -0700

Please help!  Got another panic today, while migrating directories:

*[Tue Jul 23 16:04:18 2019] LustreError:52142:0:(service.c:189:ptlrpc_save_lock()) ASSERTION( rs->rs_nlocks < 8) failed:****[Tue Jul 23 16:04:18 2019] LustreError:52142:0:(service.c:189:ptlrpc_save_lock()) LBUG****[Tue Jul 23 16:04:18 2019] Pid: 52142, comm: mdt00_0023.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Tue Apr 30 22:18:15 UTC 2019**

**
**Message from syslogd@cmal14lb27 at Jul 23 16:04:15 ...**

** kernel:[101545.117309] LustreError:52142:0:(service.c:189:ptlrpc_save_lock()) ASSERTION( rs->rs_nlocks < 8) failed:**

**
**Message from syslogd@cmal14lb27 at Jul 23 16:04:15 ...**

** kernel:[101545.119130] LustreError:52142:0:(service.c:189:ptlrpc_save_lock()) LBUG**

**
**Message from syslogd@cmal14lb27 at Jul 23 16:04:15 ...**

** kernel:LustreError: 52142:0:(service.c:189:ptlrpc_save_lock())ASSERTION( rs->rs_nlocks < 8 ) failed:**

**
**Message from syslogd@cmal14lb27 at Jul 23 16:04:15 ...**
** kernel:LustreError: 52142:0:(service.c:189:ptlrpc_save_lock()) LBUG**
**[Tue Jul 23 16:04:18 2019] Call Trace:**

**[Tue Jul 23 16:04:18 2019] [<ffffffffc0f717cc>]libcfs_call_trace+0x8c/0xc0 [libcfs]****[Tue Jul 23 16:04:18 2019] [<ffffffffc0f7187c>]lbug_with_loc+0x4c/0xa0 [libcfs]****[Tue Jul 23 16:04:18 2019] [<ffffffffc12f9c41>]ptlrpc_save_lock+0xc1/0xd0 [ptlrpc]****[Tue Jul 23 16:04:18 2019] [<ffffffffc1892b0b>]mdt_save_lock+0x20b/0x360 [mdt]****[Tue Jul 23 16:04:18 2019] [<ffffffffc1892cbc>]mdt_object_unlock+0x5c/0x3c0 [mdt]****[Tue Jul 23 16:04:18 2019] [<ffffffffc18aba52>]mdt_reint_striped_unlock+0x1a2/0x2f0 [mdt]****[Tue Jul 23 16:04:18 2019] [<ffffffffc18abbc8>]mdt_migrate_object_unlock+0x28/0x60 [mdt]****[Tue Jul 23 16:04:18 2019] [<ffffffffc18b0544>]mdt_reint_migrate+0x934/0x1310 [mdt]****[Tue Jul 23 16:04:18 2019] [<ffffffffc18b0fa3>]mdt_reint_rec+0x83/0x210 [mdt]****[Tue Jul 23 16:04:18 2019] [<ffffffffc188f1b3>]mdt_reint_internal+0x6e3/0xaf0 [mdt]****[Tue Jul 23 16:04:18 2019] [<ffffffffc189a497>] mdt_reint+0x67/0x140[mdt]****[Tue Jul 23 16:04:18 2019] [<ffffffffc1359e5a>]tgt_request_handle+0xaea/0x1580 [ptlrpc]****[Tue Jul 23 16:04:18 2019] [<ffffffffc12ff80b>]ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]****[Tue Jul 23 16:04:18 2019] [<ffffffffc130313c>]ptlrpc_main+0xafc/0x1fc0 [ptlrpc]**

**[Tue Jul 23 16:04:18 2019] [<ffffffff9fec1c71>] kthread+0xd1/0xe0**

**[Tue Jul 23 16:04:18 2019] [<ffffffffa0575c37>]ret_from_fork_nospec_end+0x0/0x39**

**[Tue Jul 23 16:04:18 2019] [<ffffffffffffffff>] 0xffffffffffffffff**
*

    Regards,

        Jonny


------------------------------------------------------------------------
globo.com       
*João Carlos Mendes Luís*
*Senior DevOps Engineer*
[email protected] <mailto:[email protected]>
+55-21-2483-6893
+55-21-99218-1222


On 22/07/2019 23:35, Joao Carlos Mendes Luis wrote:

On 7/22/19 11:10 PM, Andreas Dilger wrote:
If you are trying to delete MDT0000 then that is definitely notimplemented yet...
No, no, no...
This was my first idea, but then I understood that the root directoryis always on MDT0, so I had to migrate it to another server (afterhaving created two more, and crashed during migration).
I will later try to migrate to another server, and then delete MDT2. But first I need to finish this lfsck... :-(
These "NOT IMPLEMETED (sic)" messages are just from running lfsck_start -A
Cheers, Andreas
On Jul 22, 2019, at 16:08, João Carlos Mendes Luís<[email protected] <mailto:[email protected]>> wrote:
Hi,
I'm running some lab tests with lustre 2.12.2 in Oracle LinuxServer release 7.6. Last test I did was about migration and MDTsplitting. I started with a MGS+MDS node, and two OSS nodes, andone of the tests was to create two more MDSs and migrate databetween then, until, after some time, I could delete the originalMDS. But something happened in the middle and the serverspanicked/rebooted.
I am now in what appears to be an lfsck bug. After many othertests, I run lfsck_start, and after some time get this message onthe nodes:
MGS/MDS0:
*[Mon Jul 22 17:42:25 2019] LustreError:24107:0:(osd_index.c:1872:osd_index_it_get()) NOT IMPLEMETED YET(move to 0x2481000002000000)*
OSS1/MDS1
*[Mon Jul 22 17:40:29 2019] LustreError:31558:0:(osd_index.c:1872:osd_index_it_get()) NOT IMPLEMETED YET(move to 0xa41300c002000000)*
OST2/MDS2
*[Mon Jul 22 17:40:32 2019] LustreError:8935:0:(osd_index.c:1872:osd_index_it_get()) NOT IMPLEMETED YET(move to 0xa013000003000000)*
And for current lfsck status, I run *lctl get_param *.*.lfsck* |grep -E 'status|\.lfsck_lay|\.lfsck_name'*
MGS/MDS0:

*mdd.mirror01-MDT0000.lfsck_layout=**
**status: completed**
**mdd.mirror01-MDT0000.lfsck_namespace=**
**status: partial*

OSS1/MDS1

*mdd.mirror01-MDT0001.lfsck_layout=**
**status: completed**
**mdd.mirror01-MDT0001.lfsck_namespace=**
**status: partial**
**obdfilter.mirror01-OST0065.lfsck_layout=**
**status: completed*

OST2/MDS2

*mdd.mirror01-MDT0002.lfsck_layout=**
**status: completed**
**mdd.mirror01-MDT0002.lfsck_namespace=**
**status: partial**
**obdfilter.mirror01-OST0066.lfsck_layout=**
**status: completed*

    Is this a known bug?  How do I fix these "partial" lsfck runs?

    Thanks for any help,


        Jonny


------------------------------------------------------------------------
globo.com       
*João Carlos Mendes Luís*
*Senior DevOps Engineer*
[email protected] <mailto:[email protected]>
+55-21-2483-6893
+55-21-99218-1222


_______________________________________________
lustre-discuss mailing list
[email protected] <mailto:[email protected]>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
    Atenciosamente,

        Jonny

--
João Carlos Mendes Luís
Globo.COM - +55-21-2483-6893

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Error in lfsck: "NOT IMPLEMETED YET"

Reply via email to