Thanks for the fast reply! If I understood correctly, it is currently not possible to use the changelog feature together with the snapshot feature, right?
Is there already a LU-Ticket about that? Cheers, Robert On 09/10/2018 02:57 PM, Yong, Fan wrote: > > It is suspected that there were some llog to be handled when the > snapshot was making Then when mount-up such snapshot, some conditions > trigger the llog cleanup/modification automatically. So it is not > related with your actions when mount the snapshot. Since we cannot > control the system status when making the snapshot, then we have to > skip llog related cleanup/modification against the snapshot when mount > the snapshot. Such “skip” related logic is just what we need. > > > > Cheers, > > Nasf > > *From:*lustre-discuss [mailto:[email protected]] > *On Behalf Of * Robert Redl > *Sent:* Saturday, September 8, 2018 9:04 PM > *To:* [email protected] > *Subject:* Re: [lustre-discuss] Lustre/ZFS snapshots mount error > > > > Dear All, > > we have a similar setup with Lustre on ZFS and we make regular use of > snapshots for the purpose of backups (backups on tape use snapshots as > source). We would like to use robinhood in future and the question is > now how to do it. > > Would it be a workaround to disable the robinhood daemon temporary > during the mount process? > Does the problem only occur when changelogs are consumed during the > process of mounting a snapshot? Or is it also a problem when > changelogs are consumed while the snapshot remains mounted (which is > for us typically several hours)? > Is there already an LU-ticket about this issue? > > Thanks! > Robert > > -- > Dr. Robert Redl > Scientific Programmer, "Waves to Weather" (SFB/TRR165) > Meteorologisches Institut > Ludwig-Maximilians-Universität München > Theresienstr. 37, 80333 München, Germany > > Am 03.09.2018 um 08:16 schrieb Yong, Fan: > > I would say that it is not your operations order caused trouble. > Instead, it is related with the snapshot mount logic. As mentioned > in former reply, we need some patch for the llog logic to avoid > modifying llog under snapshot mode. > > > > > > -- > > Cheers, > > Nasf > > *From:*Kirk, Benjamin (JSC-EG311) [mailto:[email protected]] > *Sent:* Tuesday, August 28, 2018 7:53 PM > *To:* [email protected] > <mailto:[email protected]> > *Cc:* Andreas Dilger <[email protected]> > <mailto:[email protected]>; Yong, Fan <[email protected]> > <mailto:[email protected]> > *Subject:* Re: [lustre-discuss] Lustre/ZFS snapshots mount error > > > > The MDS situation is very basic: active/passive mds0/mds1 for both > fas & fsB. fsA has the combined msg/mdt in a single zfs > filesystem, and fsB has its own mdt in a separate zfs filesystem. > mds0 is primary for all. > > > > fsA & fsB DO both have changelogs enabled to feed robinhood databases. > > > > What’s the recommended procedure here we should follow before > mounting the snapshots? > > > > 1) disable changelogs on the active mdt’s (this will compromise > robinhood, requiring a rescan…), or > > 2) temporarily halt changelog consumption / cleanup (e.g. stop > robinhood in our case) and then mount the snapshot? > > > > Thanks for the help! > > > > -- > > Benjamin S. Kirk, Ph.D. > > NASA Lyndon B. Johnson Space Center > > Acting Chief, Aeroscience & Flight Mechanics Division > > > > On Aug 27, 2018, at 7:33 PM, Yong, Fan <[email protected] > <mailto:[email protected]>> wrote: > > > > According to the stack trace, someone was trying to cleanup > old empty llogs during mount the snapshot. We do NOT allow any > modification during mount snapshot; otherwise, it will trigger > ZFS backend BUG(). That is why we add LASSERT() when start the > transaction. One possible solution is that, we can add some > check in the llog logic to avoid modifying llog under snapshot > mode. > > > -- > Cheers, > Nasf > > -----Original Message----- > From: lustre-discuss > [mailto:[email protected]] On Behalf Of > Andreas Dilger > Sent: Tuesday, August 28, 2018 5:57 AM > To: Kirk, Benjamin (JSC-EG311) <[email protected] > <mailto:[email protected]>> > Cc: [email protected] > <mailto:[email protected]> > Subject: Re: [lustre-discuss] Lustre/ZFS snapshots mount error > > It's probably best to file an LU ticket for this issue. > > It looks like there is something with the log processing at > mount that is trying to modify the configuration files. I'm > not sure whether that should be allowed or not. > > Does fab have the same MGS as fsA? Does it have the same MDS > node as fsA? > If it has a different MDS, you might consider to give it its > own MGS as well. > That doesn't have to be a separate MGS node, just a separate > filesystem (ZFS fileset in the same zpool) on the MDS node. > > Cheers, Andreas > > > > On Aug 27, 2018, at 10:18, Kirk, Benjamin (JSC-EG311) > <[email protected] <mailto:[email protected]>> > wrote: > > Hi all, > > We have two filesystems, fsA & fsB (eadc below). Both of > which get snapshots taken daily, rotated over a week. > It’s a beautiful feature we’ve been using in production > ever since it was introduced with 2.10. > > -) We’ve got Lustre/ZFS 2.10.4 on CentOS 7.5. > -) Both fsA & fsB have changelogs active. > -) fsA has combined mgt/mdt on a single ZFS filesystem. > -) fsB has a single mdt on a single ZFS filesystem. > -) for fsA, I have no issues mounting any of the snapshots > via lctl. > -) for fsB, I can mount the most three recent snapshots, > then encounter errors: > > [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n > eadc_AutoSS-Mon > mounted the snapshot eadc_AutoSS-Mon with fsname 3d40bbc > [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n > eadc_AutoSS-Mon > [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n > eadc_AutoSS-Sun > mounted the snapshot eadc_AutoSS-Sun with fsname 584c07a > [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n > eadc_AutoSS-Sun > [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n > eadc_AutoSS-Sat > mounted the snapshot eadc_AutoSS-Sat with fsname 4e646fe > [root@hpfs-fsl-mds0 ~]# lctl snapshot_umount -F eadc -n > eadc_AutoSS-Sat > [root@hpfs-fsl-mds0 ~]# lctl snapshot_mount -F eadc -n > eadc_AutoSS-Fri > mount.lustre: mount metadata/meta-eadc@eadc_AutoSS-Fri at > /mnt/eadc_AutoSS-Fri_MDT0000 failed: Read-only file system > Can't mount > the snapshot eadc_AutoSS-Fri: Read-only file system > > The relevant bits from dmesg are > [1353434.417762] Lustre: 3d40bbc-MDT0000: set dev_rdonly > on this > device [1353434.417765] Lustre: Skipped 3 previous similar > messages > [1353434.649480] Lustre: 3d40bbc-MDT0000: Imperative > Recovery enabled, > recovery window shrunk from 300-900 down to 150-900 > [1353434.649484] > Lustre: Skipped 3 previous similar messages > [1353434.866228] Lustre: > 3d40bbc-MDD0000: changelog on [1353434.866233] Lustre: > Skipped 1 > previous similar message [1353435.427744] Lustre: > 3d40bbc-MDT0000: > Connection restored to ...@tcp <mailto:...@tcp> (at > ...@tcp <mailto:...@tcp>) [1353435.427747] Lustre: > Skipped 23 previous similar messages [1353445.255899] > Lustre: Failing > over 3d40bbc-MDT0000 [1353445.255903] Lustre: Skipped 3 > previous > similar messages [1353445.256150] LustreError: 11-0: > 3d40bbc-OST0000-osc-MDT0000: operation ost_disconnect to > node ...@tcp <mailto:...@tcp> > failed: rc = -107 [1353445.257896] LustreError: Skipped 23 > previous > similar messages [1353445.353874] Lustre: server umount > 3d40bbc-MDT0000 complete [1353445.353877] Lustre: Skipped > 3 previous > similar messages [1353475.302224] Lustre: 4e646fe-MDD0000: > changelog > on [1353475.302228] Lustre: Skipped 1 previous similar > message [1353498.964016] LustreError: > 25582:0:(osd_handler.c:341:osd_trans_create()) > 36ca26b-MDT0000-osd: someone try to start transaction > under readonly mode, should be disabled. > [1353498.967260] LustreError: > 25582:0:(osd_handler.c:341:osd_trans_create()) Skipped 1 > previous similar message > [1353498.968829] CPU: 6 PID: 25582 Comm: mount.lustre > Kdump: loaded Tainted: P OE ------------ > 3.10.0-862.6.3.el7.x86_64 #1 > [1353498.968830] Hardware name: Supermicro > SYS-6027TR-D71FRF/X9DRT, > BIOS 3.2a 08/04/2015 [1353498.968832] Call Trace: > [1353498.968841] [<ffffffffb5b0e80e>] dump_stack+0x19/0x1b > [1353498.968851] [<ffffffffc0cbe5db>] > osd_trans_create+0x38b/0x3d0 > [osd_zfs] [1353498.968876] [<ffffffffc1116044>] > llog_destroy+0x1f4/0x3f0 [obdclass] [1353498.968887] > [<ffffffffc111f0f6>] llog_cat_reverse_process_cb+0x246/0x3f0 > [obdclass] [1353498.968897] [<ffffffffc111a32c>] > llog_reverse_process+0x38c/0xaa0 [obdclass] [1353498.968910] > [<ffffffffc111eeb0>] ? llog_cat_process_cb+0x4e0/0x4e0 > [obdclass] > [1353498.968922] [<ffffffffc111af69>] > llog_cat_reverse_process+0x179/0x270 [obdclass] > [1353498.968932] > [<ffffffffc1115585>] ? llog_init_handle+0xd5/0x9a0 [obdclass] > [1353498.968943] [<ffffffffc1116e78>] ? > llog_open_create+0x78/0x320 > [obdclass] [1353498.968949] [<ffffffffc12e55f0>] ? > mdd_root_get+0xf0/0xf0 [mdd] [1353498.968954] > [<ffffffffc12ec7af>] > mdd_prepare+0x13ff/0x1c70 [mdd] [1353498.968966] > [<ffffffffc166b037>] > mdt_prepare+0x57/0x3b0 [mdt] [1353498.968983] > [<ffffffffc1183afd>] > server_start_targets+0x234d/0x2bd0 [obdclass] > [1353498.968999] > [<ffffffffc1153500>] ? class_config_dump_handler+0x7e0/0x7e0 > [obdclass] [1353498.969012] [<ffffffffc118541d>] > server_fill_super+0x109d/0x185a [obdclass] [1353498.969025] > [<ffffffffc115cef8>] lustre_fill_super+0x328/0x950 [obdclass] > [1353498.969038] [<ffffffffc115cbd0>] ? > lustre_common_put_super+0x270/0x270 [obdclass] > [1353498.969041] > [<ffffffffb561f3bf>] mount_nodev+0x4f/0xb0 [1353498.969053] > [<ffffffffc1154f18>] lustre_mount+0x38/0x60 [obdclass] > [1353498.969055] [<ffffffffb561ff3e>] mount_fs+0x3e/0x1b0 > [1353498.969060] [<ffffffffb563d4b7>] > vfs_kern_mount+0x67/0x110 [1353498.969062] > [<ffffffffb563fadf>] do_mount+0x1ef/0xce0 > [1353498.969066] [<ffffffffb55f7c2c>] ? > kmem_cache_alloc_trace+0x3c/0x200 [1353498.969069] > [<ffffffffb5640913>] SyS_mount+0x83/0xd0 [1353498.969074] > [<ffffffffb5b20795>] system_call_fastpath+0x1c/0x21 > [1353498.969079] LustreError: > 25582:0:(llog_cat.c:1027:llog_cat_reverse_process_cb()) > 36ca26b-MDD0000: fail to destroy empty log: rc = -30 > [1353498.970785] CPU: 6 PID: 25582 Comm: mount.lustre > Kdump: loaded Tainted: P OE ------------ > 3.10.0-862.6.3.el7.x86_64 #1 > [1353498.970786] Hardware name: Supermicro > SYS-6027TR-D71FRF/X9DRT, > BIOS 3.2a 08/04/2015 [1353498.970787] Call Trace: > [1353498.970790] [<ffffffffb5b0e80e>] dump_stack+0x19/0x1b > [1353498.970795] [<ffffffffc0cbe5db>] > osd_trans_create+0x38b/0x3d0 > [osd_zfs] [1353498.970807] [<ffffffffc1117921>] > llog_cancel_rec+0xc1/0x880 [obdclass] [1353498.970817] > [<ffffffffc111e13b>] llog_cat_cleanup+0xdb/0x380 [obdclass] > [1353498.970827] [<ffffffffc111f14d>] > llog_cat_reverse_process_cb+0x29d/0x3f0 [obdclass] > [1353498.970838] > [<ffffffffc111a32c>] llog_reverse_process+0x38c/0xaa0 > [obdclass] > [1353498.970848] [<ffffffffc111eeb0>] ? > llog_cat_process_cb+0x4e0/0x4e0 [obdclass] [1353498.970858] > [<ffffffffc111af69>] llog_cat_reverse_process+0x179/0x270 > [obdclass] > [1353498.970868] [<ffffffffc1115585>] ? > llog_init_handle+0xd5/0x9a0 > [obdclass] [1353498.970878] [<ffffffffc1116e78>] ? > llog_open_create+0x78/0x320 [obdclass] [1353498.970883] > [<ffffffffc12e55f0>] ? mdd_root_get+0xf0/0xf0 [mdd] > [1353498.970887] > [<ffffffffc12ec7af>] mdd_prepare+0x13ff/0x1c70 [mdd] > [1353498.970894] > [<ffffffffc166b037>] mdt_prepare+0x57/0x3b0 [mdt] > [1353498.970908] > [<ffffffffc1183afd>] server_start_targets+0x234d/0x2bd0 > [obdclass] > [1353498.970924] [<ffffffffc1153500>] ? > class_config_dump_handler+0x7e0/0x7e0 [obdclass] > [1353498.970938] > [<ffffffffc118541d>] server_fill_super+0x109d/0x185a > [obdclass] > [1353498.970950] [<ffffffffc115cef8>] > lustre_fill_super+0x328/0x950 > [obdclass] [1353498.970962] [<ffffffffc115cbd0>] ? > lustre_common_put_super+0x270/0x270 [obdclass] > [1353498.970964] > [<ffffffffb561f3bf>] mount_nodev+0x4f/0xb0 [1353498.970976] > [<ffffffffc1154f18>] lustre_mount+0x38/0x60 [obdclass] > [1353498.970978] [<ffffffffb561ff3e>] mount_fs+0x3e/0x1b0 > [1353498.970980] [<ffffffffb563d4b7>] > vfs_kern_mount+0x67/0x110 > [1353498.970982] [<ffffffffb563fadf>] do_mount+0x1ef/0xce0 > [1353498.970984] [<ffffffffb55f7c2c>] ? > kmem_cache_alloc_trace+0x3c/0x200 [1353498.970986] > [<ffffffffb5640913>] SyS_mount+0x83/0xd0 [1353498.970989] > [<ffffffffb5b20795>] system_call_fastpath+0x1c/0x21 > [1353498.970996] > LustreError: > 25582:0:(mdd_device.c:354:mdd_changelog_llog_init()) > 36ca26b-MDD0000: changelog init failed: rc = -30 > [1353498.972790] > LustreError: 25582:0:(mdd_device.c:427:mdd_changelog_init()) > 36ca26b-MDD0000: changelog setup during init failed: rc = -30 > [1353498.974525] LustreError: > 25582:0:(mdd_device.c:1061:mdd_prepare()) 36ca26b-MDD0000: > failed to > initialize changelog: rc = -30 [1353498.976229] LustreError: > 25582:0:(obd_mount_server.c:1879:server_fill_super()) > Unable to start > targets: -30 [1353499.072002] LustreError: > 25582:0:(obd_mount.c:1582:lustre_fill_super()) Unable to > mount (-30) > > > I’m hoping those traces mean something to someone - any ideas? > > Thanks! > > -- > Benjamin S. Kirk > > _______________________________________________ > lustre-discuss mailing list > [email protected] > <mailto:[email protected]> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > > Cheers, Andreas > --- > Andreas Dilger > CTO Whamcloud > > > > > > > > > > _______________________________________________ > > lustre-discuss mailing list > > [email protected] > <mailto:[email protected]> > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > >
signature.asc
Description: OpenPGP digital signature
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
