I have 2 more questions: 1. Is dual-mgs supported with zfs? My issue seems to be mgs and mdt on same node, when mgs is configured for 2 nodes 2. Which is recommended? ldiskfs w/ 2x mdt, or zfs w/ single mdt?
I assumed the llnl seqouia implementation used zfs w/ HA (dual mgs/dual mdt active/passive) but I might be wrong on that account. Thanks for you help On 17/12//2013 23:13, Dilger, Andreas wrote: > On 2013/12/17 9:37 AM, "Sten Wolf" <[email protected]> wrote: >> This is my situation: >> I have 2 nodes MDS1 , MDS2 (10.0.0.22 , 10.0.0.23) I wish to use as >> failover MGS, active/active MDT with zfs. >> I have a jbod shelf with 12 disks, seen by both nodes as das (the shelf >> has 2 sas ports, connected to a sas hba on each node), and I am using >> lustre 2.4 on centos 6.4 x64 > If you are using ZFS + DNE (multiple MDTs), I'd strongly recommend to use > Lustre 2.5 instead of 2.4. There were quite a bunch of fixes in this > version for both of those features (which are both new in 2.4). Also, > Lustre 2.5 is the new long-term maintenance stream, so there will be > regular updates for that version. > > I have to admit that the combination of those two features has been tested > less than either ZFS + 1 MDT or ldiskfs + 2+ MDTs separately. There are > also a couple of known performance issues with the interaction of these > features that are not yet fixed. > > I do expect that this combination is working, but there will likely be > some issues that haven't been seen before. > > Cheers, Andreas > >> I have created 3 zfs pools: >> 1. mgs: >> # zpool create -f -o ashift=12 -O canmount=off lustre-mgs mirror >> /dev/disk/by-id/wwn-0x50000c0f012306fc >> /dev/disk/by-id/wwn-0x50000c0f01233aec >> # mkfs.lustre --mgs --servicenode=mds1@tcp0 --servicenode=mds2@tcp0 >> --param sys.timeout=5000 --backfstype=zfs lustre-mgs/mgs >> >> Permanent disk data: >> Target: MGS >> Index: unassigned >> Lustre FS: >> Mount type: zfs >> Flags: 0x1064 >> (MGS first_time update no_primnode ) >> Persistent mount opts: >> Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp >> sys.timeout=5000 >> >> 2 mdt0: >> # zpool create -f -o ashift=12 -O canmount=off lustre-mdt0 mirror >> /dev/disk/by-id/wwn-0x50000c0f01d07a34 >> /dev/disk/by-id/wwn-0x50000c0f01d110c8 >> # mkfs.lustre --mdt --fsname=fs0 --servicenode=mds1@tcp0 >> --servicenode=mds2@tcp0 --param sys.timeout=5000 --backfstype=zfs >> --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0 lustre-mdt0/mdt0 >> warning: lustre-mdt0/mdt0: for Lustre 2.4 and later, the target index >> must be specified with --index >> >> Permanent disk data: >> Target: fs0:MDT0000 >> Index: 0 >> Lustre FS: fs0 >> Mount type: zfs >> Flags: 0x1061 >> (MDT first_time update no_primnode ) >> Persistent mount opts: >> Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp >> sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp >> >> checking for existing Lustre data: not found >> mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt0/mdt0 >> Writing lustre-mdt0/mdt0 properties >> lustre:version=1 >> lustre:flags=4193 >> lustre:index=0 >> lustre:fsname=fs0 >> lustre:svname=fs0:MDT0000 >> lustre:failover.node=10.0.0.22@tcp >> lustre:failover.node=10.0.0.23@tcp >> lustre:sys.timeout=5000 >> lustre:mgsnode=10.0.0.22@tcp >> lustre:mgsnode=10.0.0.23@tcp >> >> 3. mdt1: >> # zpool create -f -o ashift=12 -O canmount=off lustre-mdt1 mirror >> /dev/disk/by-id/wwn-0x50000c0f01d113e0 >> /dev/disk/by-id/wwn-0x50000c0f01d116fc >> # mkfs.lustre --mdt --fsname=fs0 --servicenode=mds2@tcp0 >> --servicenode=mds1@tcp0 --param sys.timeout=5000 --backfstype=zfs >> --index=1 --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0 lustre-mdt1/mdt1 >> >> Permanent disk data: >> Target: fs0:MDT0001 >> Index: 1 >> Lustre FS: fs0 >> Mount type: zfs >> Flags: 0x1061 >> (MDT first_time update no_primnode ) >> Persistent mount opts: >> Parameters: failover.node=10.0.0.23@tcp failover.node=10.0.0.22@tcp >> sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp >> >> checking for existing Lustre data: not found >> mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt1/mdt1 >> Writing lustre-mdt1/mdt1 properties >> lustre:version=1 >> lustre:flags=4193 >> lustre:index=1 >> lustre:fsname=fs0 >> lustre:svname=fs0:MDT0001 >> lustre:failover.node=10.0.0.23@tcp >> lustre:failover.node=10.0.0.22@tcp >> lustre:sys.timeout=5000 >> lustre:mgsnode=10.0.0.22@tcp >> lustre:mgsnode=10.0.0.23@tcp >> >> a few basic sanity checks: >> # zfs list >> NAME USED AVAIL REFER MOUNTPOINT >> lustre-mdt0 824K 3.57T 136K /lustre-mdt0 >> lustre-mdt0/mdt0 136K 3.57T 136K /lustre-mdt0/mdt0 >> lustre-mdt1 716K 3.57T 136K /lustre-mdt1 >> lustre-mdt1/mdt1 136K 3.57T 136K /lustre-mdt1/mdt1 >> lustre-mgs 4.78M 3.57T 136K /lustre-mgs >> lustre-mgs/mgs 4.18M 3.57T 4.18M /lustre-mgs/mgs >> >> # zpool list >> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT >> lustre-mdt0 3.62T 1.00M 3.62T 0% 1.00x ONLINE - >> lustre-mdt1 3.62T 800K 3.62T 0% 1.00x ONLINE - >> lustre-mgs 3.62T 4.86M 3.62T 0% 1.00x ONLINE - >> >> # zpool status >> pool: lustre-mdt0 >> state: ONLINE >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> lustre-mdt0 ONLINE 0 0 0 >> mirror-0 ONLINE 0 0 0 >> wwn-0x50000c0f01d07a34 ONLINE 0 0 0 >> wwn-0x50000c0f01d110c8 ONLINE 0 0 0 >> >> errors: No known data errors >> >> pool: lustre-mdt1 >> state: ONLINE >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> lustre-mdt1 ONLINE 0 0 0 >> mirror-0 ONLINE 0 0 0 >> wwn-0x50000c0f01d113e0 ONLINE 0 0 0 >> wwn-0x50000c0f01d116fc ONLINE 0 0 0 >> >> errors: No known data errors >> >> pool: lustre-mgs >> state: ONLINE >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> lustre-mgs ONLINE 0 0 0 >> mirror-0 ONLINE 0 0 0 >> wwn-0x50000c0f012306fc ONLINE 0 0 0 >> wwn-0x50000c0f01233aec ONLINE 0 0 0 >> >> errors: No known data errors >> # zfs get lustre:svname lustre-mgs/mgs >> NAME PROPERTY VALUE SOURCE >> lustre-mgs/mgs lustre:svname MGS local >> # zfs get lustre:svname lustre-mdt0/mdt0 >> NAME PROPERTY VALUE SOURCE >> lustre-mdt0/mdt0 lustre:svname fs0:MDT0000 local >> # zfs get lustre:svname lustre-mdt1/mdt1 >> NAME PROPERTY VALUE SOURCE >> lustre-mdt1/mdt1 lustre:svname fs0:MDT0001 local >> >> So far, so good. >> My /etc/ldev.conf: >> mds1 mds2 MGS zfs:lustre-mgs/mgs >> mds1 mds2 fs0-MDT0000 zfs:lustre-mdt0/mdt0 >> mds2 mds1 fs0-MDT0001 zfs:lustre-mdt1/mdt1 >> >> my /etc/modprobe.d/lustre.conf >> # options lnet networks=tcp0(em1) >> options lnet ip2nets="tcp0 10.0.0.[22,23]; tcp0 10.0.0.*;" >> -------------------------------------------------------------------------- >> --- >> >> Now, when starting the services, I get strange errors: >> # service lustre start local >> Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS >> Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000 >> mount.lustre: mount lustre-mdt0/mdt0 at /mnt/lustre/local/fs0-MDT0000 >> failed: Input/output error >> Is the MGS running? >> # service lustre status local >> running >> >> attached lctl-dk.local01 >> >> If I run the same command again, I get a different error: >> >> # service lustre start local >> Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS >> mount.lustre: according to /etc/mtab lustre-mgs/mgs is already mounted >> on /mnt/lustre/local/MGS >> Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000 >> mount.lustre: mount lustre-mdt0/mdt0 at /mnt/lustre/local/fs0-MDT0000 >> failed: File exists >> >> attached lctl-dk.local02 >> >> What am I doing wrong? >> I have tested lnet self-test as well, using the following script: >> # cat lnet-selftest.sh >> #!/bin/bash >> export LST_SESSION=$$ >> lst new_session read/write >> lst add_group servers 10.0.0.[22,23]@tcp >> lst add_group readers 10.0.0.[22,23]@tcp >> lst add_group writers 10.0.0.[22,23]@tcp >> lst add_batch bulk_rw >> lst add_test --batch bulk_rw --from readers --to servers \ >> brw read check=simple size=1M >> lst add_test --batch bulk_rw --from writers --to servers \ >> brw write check=full size=4K >> # start running >> lst run bulk_rw >> # display server stats for 30 seconds >> lst stat servers & sleep 30; kill $! >> # tear down >> lst end_session >> >> and it seemed ok >> # modprobe lnet-selftest && ssh mds2 modprobe lnet-selftest >> # ./lnet-selftest.sh >> SESSION: read/write FEATURES: 0 TIMEOUT: 300 FORCE: No >> 10.0.0.[22,23]@tcp are added to session >> 10.0.0.[22,23]@tcp are added to session >> 10.0.0.[22,23]@tcp are added to session >> Test was added successfully >> Test was added successfully >> bulk_rw is running now >> [LNet Rates of servers] >> [R] Avg: 19486 RPC/s Min: 19234 RPC/s Max: 19739 RPC/s >> [W] Avg: 19486 RPC/s Min: 19234 RPC/s Max: 19738 RPC/s >> [LNet Bandwidth of servers] >> [R] Avg: 1737.60 MB/s Min: 1680.70 MB/s Max: 1794.51 MB/s >> [W] Avg: 1737.60 MB/s Min: 1680.70 MB/s Max: 1794.51 MB/s >> [LNet Rates of servers] >> [R] Avg: 19510 RPC/s Min: 19182 RPC/s Max: 19838 RPC/s >> [W] Avg: 19510 RPC/s Min: 19182 RPC/s Max: 19838 RPC/s >> [LNet Bandwidth of servers] >> [R] Avg: 1741.67 MB/s Min: 1679.51 MB/s Max: 1803.83 MB/s >> [W] Avg: 1741.67 MB/s Min: 1679.51 MB/s Max: 1803.83 MB/s >> [LNet Rates of servers] >> [R] Avg: 19458 RPC/s Min: 19237 RPC/s Max: 19679 RPC/s >> [W] Avg: 19458 RPC/s Min: 19237 RPC/s Max: 19679 RPC/s >> [LNet Bandwidth of servers] >> [R] Avg: 1738.87 MB/s Min: 1687.28 MB/s Max: 1790.45 MB/s >> [W] Avg: 1738.87 MB/s Min: 1687.28 MB/s Max: 1790.45 MB/s >> [LNet Rates of servers] >> [R] Avg: 19587 RPC/s Min: 19293 RPC/s Max: 19880 RPC/s >> [W] Avg: 19586 RPC/s Min: 19293 RPC/s Max: 19880 RPC/s >> [LNet Bandwidth of servers] >> [R] Avg: 1752.62 MB/s Min: 1695.38 MB/s Max: 1809.85 MB/s >> [W] Avg: 1752.62 MB/s Min: 1695.38 MB/s Max: 1809.85 MB/s >> [LNet Rates of servers] >> [R] Avg: 19528 RPC/s Min: 19232 RPC/s Max: 19823 RPC/s >> [W] Avg: 19528 RPC/s Min: 19232 RPC/s Max: 19824 RPC/s >> [LNet Bandwidth of servers] >> [R] Avg: 1741.63 MB/s Min: 1682.29 MB/s Max: 1800.98 MB/s >> [W] Avg: 1741.63 MB/s Min: 1682.29 MB/s Max: 1800.98 MB/s >> session is ended >> ./lnet-selftest.sh: line 17: 8835 Terminated lst stat >> servers >> >> >> Addendum - I can start the MGS service on the 2nd node, and then start >> mdt0 service on local node: >> # ssh mds2 service lustre start MGS >> Mounting lustre-mgs/mgs on /mnt/lustre/foreign/MGS >> # service lustre start fs0-MDT0000 >> Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000 >> # service lustre status >> unhealthy >> # service lustre status local >> running >> > > Cheers, Andreas _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
