On 2013/12/17 9:37 AM, "Sten Wolf" <s...@checkpalm.com> wrote: >This is my situation: >I have 2 nodes MDS1 , MDS2 (10.0.0.22 , 10.0.0.23) I wish to use as >failover MGS, active/active MDT with zfs. >I have a jbod shelf with 12 disks, seen by both nodes as das (the shelf >has 2 sas ports, connected to a sas hba on each node), and I am using >lustre 2.4 on centos 6.4 x64
If you are using ZFS + DNE (multiple MDTs), I'd strongly recommend to use Lustre 2.5 instead of 2.4. There were quite a bunch of fixes in this version for both of those features (which are both new in 2.4). Also, Lustre 2.5 is the new long-term maintenance stream, so there will be regular updates for that version. I have to admit that the combination of those two features has been tested less than either ZFS + 1 MDT or ldiskfs + 2+ MDTs separately. There are also a couple of known performance issues with the interaction of these features that are not yet fixed. I do expect that this combination is working, but there will likely be some issues that haven't been seen before. Cheers, Andreas >I have created 3 zfs pools: >1. mgs: ># zpool create -f -o ashift=12 -O canmount=off lustre-mgs mirror >/dev/disk/by-id/wwn-0x50000c0f012306fc >/dev/disk/by-id/wwn-0x50000c0f01233aec ># mkfs.lustre --mgs --servicenode=mds1@tcp0 --servicenode=mds2@tcp0 >--param sys.timeout=5000 --backfstype=zfs lustre-mgs/mgs > > Permanent disk data: >Target: MGS >Index: unassigned >Lustre FS: >Mount type: zfs >Flags: 0x1064 > (MGS first_time update no_primnode ) >Persistent mount opts: >Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp >sys.timeout=5000 > >2 mdt0: ># zpool create -f -o ashift=12 -O canmount=off lustre-mdt0 mirror >/dev/disk/by-id/wwn-0x50000c0f01d07a34 >/dev/disk/by-id/wwn-0x50000c0f01d110c8 ># mkfs.lustre --mdt --fsname=fs0 --servicenode=mds1@tcp0 >--servicenode=mds2@tcp0 --param sys.timeout=5000 --backfstype=zfs >--mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0 lustre-mdt0/mdt0 >warning: lustre-mdt0/mdt0: for Lustre 2.4 and later, the target index >must be specified with --index > > Permanent disk data: >Target: fs0:MDT0000 >Index: 0 >Lustre FS: fs0 >Mount type: zfs >Flags: 0x1061 > (MDT first_time update no_primnode ) >Persistent mount opts: >Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp >sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp > >checking for existing Lustre data: not found >mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt0/mdt0 >Writing lustre-mdt0/mdt0 properties > lustre:version=1 > lustre:flags=4193 > lustre:index=0 > lustre:fsname=fs0 > lustre:svname=fs0:MDT0000 > lustre:failover.node=10.0.0.22@tcp > lustre:failover.node=10.0.0.23@tcp > lustre:sys.timeout=5000 > lustre:mgsnode=10.0.0.22@tcp > lustre:mgsnode=10.0.0.23@tcp > >3. mdt1: ># zpool create -f -o ashift=12 -O canmount=off lustre-mdt1 mirror >/dev/disk/by-id/wwn-0x50000c0f01d113e0 >/dev/disk/by-id/wwn-0x50000c0f01d116fc ># mkfs.lustre --mdt --fsname=fs0 --servicenode=mds2@tcp0 >--servicenode=mds1@tcp0 --param sys.timeout=5000 --backfstype=zfs >--index=1 --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0 lustre-mdt1/mdt1 > > Permanent disk data: >Target: fs0:MDT0001 >Index: 1 >Lustre FS: fs0 >Mount type: zfs >Flags: 0x1061 > (MDT first_time update no_primnode ) >Persistent mount opts: >Parameters: failover.node=10.0.0.23@tcp failover.node=10.0.0.22@tcp >sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp > >checking for existing Lustre data: not found >mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt1/mdt1 >Writing lustre-mdt1/mdt1 properties > lustre:version=1 > lustre:flags=4193 > lustre:index=1 > lustre:fsname=fs0 > lustre:svname=fs0:MDT0001 > lustre:failover.node=10.0.0.23@tcp > lustre:failover.node=10.0.0.22@tcp > lustre:sys.timeout=5000 > lustre:mgsnode=10.0.0.22@tcp > lustre:mgsnode=10.0.0.23@tcp > >a few basic sanity checks: ># zfs list >NAME USED AVAIL REFER MOUNTPOINT >lustre-mdt0 824K 3.57T 136K /lustre-mdt0 >lustre-mdt0/mdt0 136K 3.57T 136K /lustre-mdt0/mdt0 >lustre-mdt1 716K 3.57T 136K /lustre-mdt1 >lustre-mdt1/mdt1 136K 3.57T 136K /lustre-mdt1/mdt1 >lustre-mgs 4.78M 3.57T 136K /lustre-mgs >lustre-mgs/mgs 4.18M 3.57T 4.18M /lustre-mgs/mgs > ># zpool list >NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT >lustre-mdt0 3.62T 1.00M 3.62T 0% 1.00x ONLINE - >lustre-mdt1 3.62T 800K 3.62T 0% 1.00x ONLINE - >lustre-mgs 3.62T 4.86M 3.62T 0% 1.00x ONLINE - > ># zpool status > pool: lustre-mdt0 > state: ONLINE > scan: none requested >config: > > NAME STATE READ WRITE CKSUM > lustre-mdt0 ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > wwn-0x50000c0f01d07a34 ONLINE 0 0 0 > wwn-0x50000c0f01d110c8 ONLINE 0 0 0 > >errors: No known data errors > > pool: lustre-mdt1 > state: ONLINE > scan: none requested >config: > > NAME STATE READ WRITE CKSUM > lustre-mdt1 ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > wwn-0x50000c0f01d113e0 ONLINE 0 0 0 > wwn-0x50000c0f01d116fc ONLINE 0 0 0 > >errors: No known data errors > > pool: lustre-mgs > state: ONLINE > scan: none requested >config: > > NAME STATE READ WRITE CKSUM > lustre-mgs ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > wwn-0x50000c0f012306fc ONLINE 0 0 0 > wwn-0x50000c0f01233aec ONLINE 0 0 0 > >errors: No known data errors ># zfs get lustre:svname lustre-mgs/mgs >NAME PROPERTY VALUE SOURCE >lustre-mgs/mgs lustre:svname MGS local ># zfs get lustre:svname lustre-mdt0/mdt0 >NAME PROPERTY VALUE SOURCE >lustre-mdt0/mdt0 lustre:svname fs0:MDT0000 local ># zfs get lustre:svname lustre-mdt1/mdt1 >NAME PROPERTY VALUE SOURCE >lustre-mdt1/mdt1 lustre:svname fs0:MDT0001 local > >So far, so good. >My /etc/ldev.conf: >mds1 mds2 MGS zfs:lustre-mgs/mgs >mds1 mds2 fs0-MDT0000 zfs:lustre-mdt0/mdt0 >mds2 mds1 fs0-MDT0001 zfs:lustre-mdt1/mdt1 > >my /etc/modprobe.d/lustre.conf ># options lnet networks=tcp0(em1) >options lnet ip2nets="tcp0 10.0.0.[22,23]; tcp0 10.0.0.*;" >-------------------------------------------------------------------------- >--- > >Now, when starting the services, I get strange errors: ># service lustre start local >Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS >Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000 >mount.lustre: mount lustre-mdt0/mdt0 at /mnt/lustre/local/fs0-MDT0000 >failed: Input/output error >Is the MGS running? ># service lustre status local >running > >attached lctl-dk.local01 > >If I run the same command again, I get a different error: > ># service lustre start local >Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS >mount.lustre: according to /etc/mtab lustre-mgs/mgs is already mounted >on /mnt/lustre/local/MGS >Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000 >mount.lustre: mount lustre-mdt0/mdt0 at /mnt/lustre/local/fs0-MDT0000 >failed: File exists > >attached lctl-dk.local02 > >What am I doing wrong? >I have tested lnet self-test as well, using the following script: ># cat lnet-selftest.sh >#!/bin/bash >export LST_SESSION=$$ >lst new_session read/write >lst add_group servers 10.0.0.[22,23]@tcp >lst add_group readers 10.0.0.[22,23]@tcp >lst add_group writers 10.0.0.[22,23]@tcp >lst add_batch bulk_rw >lst add_test --batch bulk_rw --from readers --to servers \ >brw read check=simple size=1M >lst add_test --batch bulk_rw --from writers --to servers \ >brw write check=full size=4K ># start running >lst run bulk_rw ># display server stats for 30 seconds >lst stat servers & sleep 30; kill $! ># tear down >lst end_session > >and it seemed ok ># modprobe lnet-selftest && ssh mds2 modprobe lnet-selftest ># ./lnet-selftest.sh >SESSION: read/write FEATURES: 0 TIMEOUT: 300 FORCE: No >10.0.0.[22,23]@tcp are added to session >10.0.0.[22,23]@tcp are added to session >10.0.0.[22,23]@tcp are added to session >Test was added successfully >Test was added successfully >bulk_rw is running now >[LNet Rates of servers] >[R] Avg: 19486 RPC/s Min: 19234 RPC/s Max: 19739 RPC/s >[W] Avg: 19486 RPC/s Min: 19234 RPC/s Max: 19738 RPC/s >[LNet Bandwidth of servers] >[R] Avg: 1737.60 MB/s Min: 1680.70 MB/s Max: 1794.51 MB/s >[W] Avg: 1737.60 MB/s Min: 1680.70 MB/s Max: 1794.51 MB/s >[LNet Rates of servers] >[R] Avg: 19510 RPC/s Min: 19182 RPC/s Max: 19838 RPC/s >[W] Avg: 19510 RPC/s Min: 19182 RPC/s Max: 19838 RPC/s >[LNet Bandwidth of servers] >[R] Avg: 1741.67 MB/s Min: 1679.51 MB/s Max: 1803.83 MB/s >[W] Avg: 1741.67 MB/s Min: 1679.51 MB/s Max: 1803.83 MB/s >[LNet Rates of servers] >[R] Avg: 19458 RPC/s Min: 19237 RPC/s Max: 19679 RPC/s >[W] Avg: 19458 RPC/s Min: 19237 RPC/s Max: 19679 RPC/s >[LNet Bandwidth of servers] >[R] Avg: 1738.87 MB/s Min: 1687.28 MB/s Max: 1790.45 MB/s >[W] Avg: 1738.87 MB/s Min: 1687.28 MB/s Max: 1790.45 MB/s >[LNet Rates of servers] >[R] Avg: 19587 RPC/s Min: 19293 RPC/s Max: 19880 RPC/s >[W] Avg: 19586 RPC/s Min: 19293 RPC/s Max: 19880 RPC/s >[LNet Bandwidth of servers] >[R] Avg: 1752.62 MB/s Min: 1695.38 MB/s Max: 1809.85 MB/s >[W] Avg: 1752.62 MB/s Min: 1695.38 MB/s Max: 1809.85 MB/s >[LNet Rates of servers] >[R] Avg: 19528 RPC/s Min: 19232 RPC/s Max: 19823 RPC/s >[W] Avg: 19528 RPC/s Min: 19232 RPC/s Max: 19824 RPC/s >[LNet Bandwidth of servers] >[R] Avg: 1741.63 MB/s Min: 1682.29 MB/s Max: 1800.98 MB/s >[W] Avg: 1741.63 MB/s Min: 1682.29 MB/s Max: 1800.98 MB/s >session is ended >./lnet-selftest.sh: line 17: 8835 Terminated lst stat >servers > > >Addendum - I can start the MGS service on the 2nd node, and then start >mdt0 service on local node: ># ssh mds2 service lustre start MGS >Mounting lustre-mgs/mgs on /mnt/lustre/foreign/MGS ># service lustre start fs0-MDT0000 >Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000 ># service lustre status >unhealthy ># service lustre status local >running > Cheers, Andreas -- Andreas Dilger Lustre Software Architect Intel High Performance Data Division _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss