I would strongly suggest to upgrade to something newer than 2.7.0-rc4. That is 3.5 years old, and you can imagine that some bugs have been fixed since then. Also, searching in https://jira.whamcloud.com/ shows this bug is already fixed.
Cheers, Andreas > On Nov 28, 2018, at 07:37, Guillaume Postic > <[email protected]> wrote: > > Hello, > > When running 'mount.lustre /dev/sdb /mdt', I got the following errors: > > -------------------------------------------------------------------------------- > Nov 28 10:52:27 localhost kernel: LNet: HW CPU cores: 32, npartitions: 4 > Nov 28 10:52:27 localhost kernel: alg: No test for adler32 (adler32-zlib) > Nov 28 10:52:27 localhost kernel: alg: No test for crc32 (crc32-table) > Nov 28 10:52:27 localhost kernel: alg: No test for crc32 (crc32-pclmul) > Nov 28 10:52:35 localhost kernel: Lustre: Lustre: Build Version: > 2.7.0-RC4--PRISTINE-2.6.32-504.8.1.el6_lustre.x86_64 > Nov 28 10:52:35 localhost kernel: LNet: Added LNI 10.0.1.60@tcp > [8/256/0/180] > Nov 28 10:52:35 localhost kernel: LNet: Added LNI 172.27.7.38@tcp1 > [8/256/0/180] > Nov 28 10:52:35 localhost kernel: LNet: Accept secure, port 988 > Nov 28 10:52:37 localhost kernel: LDISKFS-fs (sdb): recovery complete > Nov 28 10:52:37 localhost kernel: LDISKFS-fs (sdb): mounted filesystem > with ordered data mode. quota=on. Opts: > Nov 28 10:52:47 localhost kernel: Lustre: lustre-MDD0000: changelog on > Nov 28 10:52:47 localhost kernel: Lustre: lustre-MDT0000: Will be in > recovery for at least 5:00, or until 112 clients reconnect > Nov 28 10:52:49 localhost kernel: Lustre: lustre-MDT0000: Client > 5800a16f-8e18-e4f3-32a0-041e00a27e97 (at 10.0.1.102@tcp) reconnecting, > waiting for 112 clients in recovery for 4:57 > Nov 28 10:52:49 localhost kernel: Lustre: > 8210:0:(client.c:1939:ptlrpc_expire_one_request()) @@@ Request sent has > timed out for slow reply: [sent 1543398764/real 1543398764] > req@ffff88081ef2a080 x1618370892922924/t0(0) > o8->[email protected]@tcp:28/4 lens 400/544 e 0 to 1 > dl 1543398769 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 > Nov 28 10:52:49 localhost kernel: LustreError: > 8355:0:(osd_handler.c:1017:osd_trans_start()) ASSERTION( > get_current()->journal_info == ((void *)0) ) failed: > Nov 28 10:52:49 localhost kernel: LustreError: > 8355:0:(osd_handler.c:1017:osd_trans_start()) LBUG > Nov 28 10:52:49 localhost kernel: Pid: 8355, comm: mdt03_003 > Nov 28 10:52:49 localhost kernel: > Nov 28 10:52:49 localhost kernel: Call Trace: > Nov 28 10:52:49 localhost kernel: [<ffffffffa031b895>] > libcfs_debug_dumpstack+0x55/0x80 [libcfs] > Nov 28 10:52:49 localhost kernel: [<ffffffffa031be97>] > lbug_with_loc+0x47/0xb0 [libcfs] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0be424d>] > osd_trans_start+0x25d/0x660 [osd_ldiskfs] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0434b4a>] > llog_osd_destroy+0x42a/0xd40 [obdclass] > Nov 28 10:52:49 localhost kernel: [<ffffffffa042dedc>] > llog_cat_new_log+0x1ec/0x710 [obdclass] > > Message from syslogd@localhost at Nov 28 10:52:49 ... > kernel:LustreError: 8355:0:(osd_handler.c:1017:osd_trans_start()) > ASSERTION( get_current()->journal_info == ((void *)0) ) failed: > > Message from syslogd@localhost at Nov 28 10:52:49 ... > kernel:LustreError: 8355:0:(osd_handler.c:1017:osd_trans_start()) LBUG > Nov 28 10:52:49 localhost kernel: [<ffffffffa0eab54d>] ? > lod_xattr_set_internal+0x1bd/0x420 [lod] > Nov 28 10:52:49 localhost kernel: [<ffffffffa042e50a>] > llog_cat_add_rec+0x10a/0x450 [obdclass] > Nov 28 10:52:49 localhost kernel: [<ffffffffa04261e9>] > llog_add+0x89/0x1c0 [obdclass] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0f084e2>] > mdd_changelog_store+0x122/0x290 [mdd] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0f08825>] > mdd_changelog_ns_store+0x1d5/0x610 [mdd] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0f0c2c2>] ? > mdd_links_rename+0x2f2/0x530 [mdd] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0f0d76a>] ? > __mdd_index_insert+0x5a/0x160 [mdd] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0f173c8>] > mdd_create+0x12b8/0x1730 [mdd] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0de1cb8>] > mdo_create+0x18/0x50 [mdt] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0debe6f>] > mdt_reint_open+0x1f8f/0x2c70 [mdt] > Nov 28 10:52:49 localhost kernel: [<ffffffff8109eefc>] ? > remove_wait_queue+0x3c/0x50 > Nov 28 10:52:49 localhost kernel: [<ffffffffa033883c>] ? > upcall_cache_get_entry+0x29c/0x880 [libcfs] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0dd30cd>] > mdt_reint_rec+0x5d/0x200 [mdt] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0db723b>] > mdt_reint_internal+0x4cb/0x7a0 [mdt] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0db7706>] > mdt_intent_reint+0x1f6/0x430 [mdt] > Nov 28 10:52:49 localhost kernel: [<ffffffffa0db5cf4>] > mdt_intent_policy+0x494/0xce0 [mdt] > Nov 28 10:52:49 localhost kernel: [<ffffffffa063f4f9>] > ldlm_lock_enqueue+0x129/0x9d0 [ptlrpc] > Nov 28 10:52:49 localhost kernel: [<ffffffffa066b46b>] > ldlm_handle_enqueue0+0x51b/0x13f0 [ptlrpc] > -------------------------------------------------------------------------------- > > Does anyone know how to solve that problem? > > Build version: 2.7.0-RC4--PRISTINE-2.6.32-504.8.1.el6_lustre.x86_64 > > Thanks a lot, > Guillaume Postic > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org Cheers, Andreas --- Andreas Dilger CTO Whamcloud _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
