Re: [Lustre-discuss] Setting up a lustre zfs dual mgs/mdt over tcp - help requested

Dilger, Andreas Tue, 17 Dec 2013 13:25:01 -0800

On 2013/12/17 9:37 AM, "Sten Wolf" <s...@checkpalm.com> wrote:
>This is my situation:
>I have 2 nodes MDS1 , MDS2 (10.0.0.22 , 10.0.0.23) I wish to use as
>failover MGS, active/active MDT with zfs.
>I have a jbod shelf with 12 disks, seen by both nodes as das (the shelf
>has 2 sas ports, connected to a sas hba on each node), and I am using
>lustre 2.4 on centos 6.4 x64


If you are using ZFS + DNE (multiple MDTs), I'd strongly recommend to use
Lustre 2.5 instead of 2.4.  There were quite a bunch of fixes in this
version for both of those features (which are both new in 2.4).  Also,
Lustre 2.5 is the new long-term maintenance stream, so there will be
regular updates for that version.

I have to admit that the combination of those two features has been tested
less than either ZFS + 1 MDT or ldiskfs + 2+ MDTs separately.  There are
also a couple of known performance issues with the interaction of these
features that are not yet fixed.

I do expect that this combination is working, but there will likely be
some issues that haven't been seen before.

Cheers, Andreas

>I have created 3 zfs pools:
>1. mgs:
># zpool create -f -o ashift=12 -O canmount=off lustre-mgs mirror
>/dev/disk/by-id/wwn-0x50000c0f012306fc
>/dev/disk/by-id/wwn-0x50000c0f01233aec
># mkfs.lustre --mgs --servicenode=mds1@tcp0 --servicenode=mds2@tcp0
>--param sys.timeout=5000 --backfstype=zfs lustre-mgs/mgs
>
>    Permanent disk data:
>Target:     MGS
>Index:      unassigned
>Lustre FS:
>Mount type: zfs
>Flags:      0x1064
>               (MGS first_time update no_primnode )
>Persistent mount opts:
>Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp
>sys.timeout=5000
>
>2 mdt0:
># zpool create -f -o ashift=12 -O canmount=off lustre-mdt0 mirror
>/dev/disk/by-id/wwn-0x50000c0f01d07a34
>/dev/disk/by-id/wwn-0x50000c0f01d110c8
># mkfs.lustre --mdt --fsname=fs0 --servicenode=mds1@tcp0
>--servicenode=mds2@tcp0 --param sys.timeout=5000 --backfstype=zfs
>--mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0  lustre-mdt0/mdt0
>warning: lustre-mdt0/mdt0: for Lustre 2.4 and later, the target index
>must be specified with --index
>
>    Permanent disk data:
>Target:     fs0:MDT0000
>Index:      0
>Lustre FS:  fs0
>Mount type: zfs
>Flags:      0x1061
>               (MDT first_time update no_primnode )
>Persistent mount opts:
>Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp
>sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp
>
>checking for existing Lustre data: not found
>mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt0/mdt0
>Writing lustre-mdt0/mdt0 properties
>   lustre:version=1
>   lustre:flags=4193
>   lustre:index=0
>   lustre:fsname=fs0
>   lustre:svname=fs0:MDT0000
>   lustre:failover.node=10.0.0.22@tcp
>   lustre:failover.node=10.0.0.23@tcp
>   lustre:sys.timeout=5000
>   lustre:mgsnode=10.0.0.22@tcp
>   lustre:mgsnode=10.0.0.23@tcp
>
>3. mdt1:
># zpool create -f -o ashift=12 -O canmount=off lustre-mdt1 mirror
>/dev/disk/by-id/wwn-0x50000c0f01d113e0
>/dev/disk/by-id/wwn-0x50000c0f01d116fc
># mkfs.lustre --mdt --fsname=fs0 --servicenode=mds2@tcp0
>--servicenode=mds1@tcp0 --param sys.timeout=5000 --backfstype=zfs
>--index=1 --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0  lustre-mdt1/mdt1
>
>    Permanent disk data:
>Target:     fs0:MDT0001
>Index:      1
>Lustre FS:  fs0
>Mount type: zfs
>Flags:      0x1061
>               (MDT first_time update no_primnode )
>Persistent mount opts:
>Parameters: failover.node=10.0.0.23@tcp failover.node=10.0.0.22@tcp
>sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp
>
>checking for existing Lustre data: not found
>mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt1/mdt1
>Writing lustre-mdt1/mdt1 properties
>   lustre:version=1
>   lustre:flags=4193
>   lustre:index=1
>   lustre:fsname=fs0
>   lustre:svname=fs0:MDT0001
>   lustre:failover.node=10.0.0.23@tcp
>   lustre:failover.node=10.0.0.22@tcp
>   lustre:sys.timeout=5000
>   lustre:mgsnode=10.0.0.22@tcp
>   lustre:mgsnode=10.0.0.23@tcp
>
>a few basic sanity checks:
># zfs list
>NAME               USED  AVAIL  REFER  MOUNTPOINT
>lustre-mdt0        824K  3.57T   136K  /lustre-mdt0
>lustre-mdt0/mdt0   136K  3.57T   136K  /lustre-mdt0/mdt0
>lustre-mdt1        716K  3.57T   136K  /lustre-mdt1
>lustre-mdt1/mdt1   136K  3.57T   136K  /lustre-mdt1/mdt1
>lustre-mgs        4.78M  3.57T   136K  /lustre-mgs
>lustre-mgs/mgs    4.18M  3.57T  4.18M  /lustre-mgs/mgs
>
># zpool list
>NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>lustre-mdt0  3.62T  1.00M  3.62T     0%  1.00x  ONLINE  -
>lustre-mdt1  3.62T   800K  3.62T     0%  1.00x  ONLINE  -
>lustre-mgs   3.62T  4.86M  3.62T     0%  1.00x  ONLINE  -
>
># zpool status
>   pool: lustre-mdt0
>  state: ONLINE
>   scan: none requested
>config:
>
>     NAME                        STATE     READ WRITE CKSUM
>     lustre-mdt0                 ONLINE       0     0     0
>       mirror-0                  ONLINE       0     0     0
>         wwn-0x50000c0f01d07a34  ONLINE       0     0     0
>         wwn-0x50000c0f01d110c8  ONLINE       0     0     0
>
>errors: No known data errors
>
>   pool: lustre-mdt1
>  state: ONLINE
>   scan: none requested
>config:
>
>     NAME                        STATE     READ WRITE CKSUM
>     lustre-mdt1                 ONLINE       0     0     0
>       mirror-0                  ONLINE       0     0     0
>         wwn-0x50000c0f01d113e0  ONLINE       0     0     0
>         wwn-0x50000c0f01d116fc  ONLINE       0     0     0
>
>errors: No known data errors
>
>   pool: lustre-mgs
>  state: ONLINE
>   scan: none requested
>config:
>
>     NAME                        STATE     READ WRITE CKSUM
>     lustre-mgs                  ONLINE       0     0     0
>       mirror-0                  ONLINE       0     0     0
>         wwn-0x50000c0f012306fc  ONLINE       0     0     0
>         wwn-0x50000c0f01233aec  ONLINE       0     0     0
>
>errors: No known data errors
># zfs get lustre:svname lustre-mgs/mgs
>NAME            PROPERTY       VALUE          SOURCE
>lustre-mgs/mgs  lustre:svname  MGS            local
># zfs get lustre:svname lustre-mdt0/mdt0
>NAME              PROPERTY       VALUE          SOURCE
>lustre-mdt0/mdt0  lustre:svname  fs0:MDT0000    local
># zfs get lustre:svname lustre-mdt1/mdt1
>NAME              PROPERTY       VALUE          SOURCE
>lustre-mdt1/mdt1  lustre:svname  fs0:MDT0001    local
>
>So far, so good.
>My /etc/ldev.conf:
>mds1 mds2 MGS zfs:lustre-mgs/mgs
>mds1 mds2 fs0-MDT0000 zfs:lustre-mdt0/mdt0
>mds2 mds1 fs0-MDT0001 zfs:lustre-mdt1/mdt1
>
>my /etc/modprobe.d/lustre.conf
># options lnet networks=tcp0(em1)
>options lnet ip2nets="tcp0 10.0.0.[22,23]; tcp0 10.0.0.*;"
>--------------------------------------------------------------------------
>---
>
>Now, when starting the services, I get strange errors:
># service lustre start local
>Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS
>Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000
>mount.lustre: mount lustre-mdt0/mdt0 at /mnt/lustre/local/fs0-MDT0000
>failed: Input/output error
>Is the MGS running?
># service lustre status local
>running
>
>attached lctl-dk.local01
>
>If I run the same command again, I get a different error:
>
># service lustre start local
>Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS
>mount.lustre: according to /etc/mtab lustre-mgs/mgs is already mounted
>on /mnt/lustre/local/MGS
>Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000
>mount.lustre: mount lustre-mdt0/mdt0 at /mnt/lustre/local/fs0-MDT0000
>failed: File exists
>
>attached lctl-dk.local02
>
>What am I doing wrong?
>I have tested lnet self-test as well, using the following script:
># cat lnet-selftest.sh
>#!/bin/bash
>export LST_SESSION=$$
>lst new_session read/write
>lst add_group servers 10.0.0.[22,23]@tcp
>lst add_group readers 10.0.0.[22,23]@tcp
>lst add_group writers 10.0.0.[22,23]@tcp
>lst add_batch bulk_rw
>lst add_test --batch bulk_rw --from readers --to servers \
>brw read check=simple size=1M
>lst add_test --batch bulk_rw --from writers --to servers \
>brw write check=full size=4K
># start running
>lst run bulk_rw
># display server stats for 30 seconds
>lst stat servers & sleep 30; kill $!
># tear down
>lst end_session
>
>and it seemed ok
># modprobe lnet-selftest && ssh mds2 modprobe lnet-selftest
># ./lnet-selftest.sh
>SESSION: read/write FEATURES: 0 TIMEOUT: 300 FORCE: No
>10.0.0.[22,23]@tcp are added to session
>10.0.0.[22,23]@tcp are added to session
>10.0.0.[22,23]@tcp are added to session
>Test was added successfully
>Test was added successfully
>bulk_rw is running now
>[LNet Rates of servers]
>[R] Avg: 19486    RPC/s Min: 19234    RPC/s Max: 19739    RPC/s
>[W] Avg: 19486    RPC/s Min: 19234    RPC/s Max: 19738    RPC/s
>[LNet Bandwidth of servers]
>[R] Avg: 1737.60  MB/s  Min: 1680.70  MB/s  Max: 1794.51  MB/s
>[W] Avg: 1737.60  MB/s  Min: 1680.70  MB/s  Max: 1794.51  MB/s
>[LNet Rates of servers]
>[R] Avg: 19510    RPC/s Min: 19182    RPC/s Max: 19838    RPC/s
>[W] Avg: 19510    RPC/s Min: 19182    RPC/s Max: 19838    RPC/s
>[LNet Bandwidth of servers]
>[R] Avg: 1741.67  MB/s  Min: 1679.51  MB/s  Max: 1803.83  MB/s
>[W] Avg: 1741.67  MB/s  Min: 1679.51  MB/s  Max: 1803.83  MB/s
>[LNet Rates of servers]
>[R] Avg: 19458    RPC/s Min: 19237    RPC/s Max: 19679    RPC/s
>[W] Avg: 19458    RPC/s Min: 19237    RPC/s Max: 19679    RPC/s
>[LNet Bandwidth of servers]
>[R] Avg: 1738.87  MB/s  Min: 1687.28  MB/s  Max: 1790.45  MB/s
>[W] Avg: 1738.87  MB/s  Min: 1687.28  MB/s  Max: 1790.45  MB/s
>[LNet Rates of servers]
>[R] Avg: 19587    RPC/s Min: 19293    RPC/s Max: 19880    RPC/s
>[W] Avg: 19586    RPC/s Min: 19293    RPC/s Max: 19880    RPC/s
>[LNet Bandwidth of servers]
>[R] Avg: 1752.62  MB/s  Min: 1695.38  MB/s  Max: 1809.85  MB/s
>[W] Avg: 1752.62  MB/s  Min: 1695.38  MB/s  Max: 1809.85  MB/s
>[LNet Rates of servers]
>[R] Avg: 19528    RPC/s Min: 19232    RPC/s Max: 19823    RPC/s
>[W] Avg: 19528    RPC/s Min: 19232    RPC/s Max: 19824    RPC/s
>[LNet Bandwidth of servers]
>[R] Avg: 1741.63  MB/s  Min: 1682.29  MB/s  Max: 1800.98  MB/s
>[W] Avg: 1741.63  MB/s  Min: 1682.29  MB/s  Max: 1800.98  MB/s
>session is ended
>./lnet-selftest.sh: line 17:  8835 Terminated              lst stat
>servers
>
>
>Addendum - I can start the MGS service on the 2nd node, and then start
>mdt0 service on local node:
># ssh mds2 service lustre start MGS
>Mounting lustre-mgs/mgs on /mnt/lustre/foreign/MGS
># service lustre start fs0-MDT0000
>Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000
># service lustre status
>unhealthy
># service lustre status local
>running
>


Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division


_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Setting up a lustre zfs dual mgs/mdt over tcp - help requested

Reply via email to