Re: [Lustre-discuss] Setting up a lustre zfs dual mgs/mdt over tcp - help requested

Sten Wolf Tue, 17 Dec 2013 15:54:07 -0800

I have 2 more questions:

1. Is dual-mgs supported with zfs? My issue seems to be mgs and mdt on 
same node, when mgs is configured for 2 nodes
2. Which is recommended? ldiskfs w/ 2x mdt, or zfs w/ single mdt?


I assumed the llnl seqouia implementation used zfs w/ HA (dual mgs/dual 
mdt active/passive) but I might be wrong on that account.

Thanks for you help
On 17/12//2013 23:13, Dilger, Andreas wrote:
> On 2013/12/17 9:37 AM, "Sten Wolf" <[email protected]> wrote:
>> This is my situation:
>> I have 2 nodes MDS1 , MDS2 (10.0.0.22 , 10.0.0.23) I wish to use as
>> failover MGS, active/active MDT with zfs.
>> I have a jbod shelf with 12 disks, seen by both nodes as das (the shelf
>> has 2 sas ports, connected to a sas hba on each node), and I am using
>> lustre 2.4 on centos 6.4 x64
> If you are using ZFS + DNE (multiple MDTs), I'd strongly recommend to use
> Lustre 2.5 instead of 2.4.  There were quite a bunch of fixes in this
> version for both of those features (which are both new in 2.4).  Also,
> Lustre 2.5 is the new long-term maintenance stream, so there will be
> regular updates for that version.
>
> I have to admit that the combination of those two features has been tested
> less than either ZFS + 1 MDT or ldiskfs + 2+ MDTs separately.  There are
> also a couple of known performance issues with the interaction of these
> features that are not yet fixed.
>
> I do expect that this combination is working, but there will likely be
> some issues that haven't been seen before.
>
> Cheers, Andreas
>
>> I have created 3 zfs pools:
>> 1. mgs:
>> # zpool create -f -o ashift=12 -O canmount=off lustre-mgs mirror
>> /dev/disk/by-id/wwn-0x50000c0f012306fc
>> /dev/disk/by-id/wwn-0x50000c0f01233aec
>> # mkfs.lustre --mgs --servicenode=mds1@tcp0 --servicenode=mds2@tcp0
>> --param sys.timeout=5000 --backfstype=zfs lustre-mgs/mgs
>>
>>     Permanent disk data:
>> Target:     MGS
>> Index:      unassigned
>> Lustre FS:
>> Mount type: zfs
>> Flags:      0x1064
>>                (MGS first_time update no_primnode )
>> Persistent mount opts:
>> Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp
>> sys.timeout=5000
>>
>> 2 mdt0:
>> # zpool create -f -o ashift=12 -O canmount=off lustre-mdt0 mirror
>> /dev/disk/by-id/wwn-0x50000c0f01d07a34
>> /dev/disk/by-id/wwn-0x50000c0f01d110c8
>> # mkfs.lustre --mdt --fsname=fs0 --servicenode=mds1@tcp0
>> --servicenode=mds2@tcp0 --param sys.timeout=5000 --backfstype=zfs
>> --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0  lustre-mdt0/mdt0
>> warning: lustre-mdt0/mdt0: for Lustre 2.4 and later, the target index
>> must be specified with --index
>>
>>     Permanent disk data:
>> Target:     fs0:MDT0000
>> Index:      0
>> Lustre FS:  fs0
>> Mount type: zfs
>> Flags:      0x1061
>>                (MDT first_time update no_primnode )
>> Persistent mount opts:
>> Parameters: failover.node=10.0.0.22@tcp failover.node=10.0.0.23@tcp
>> sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp
>>
>> checking for existing Lustre data: not found
>> mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt0/mdt0
>> Writing lustre-mdt0/mdt0 properties
>>    lustre:version=1
>>    lustre:flags=4193
>>    lustre:index=0
>>    lustre:fsname=fs0
>>    lustre:svname=fs0:MDT0000
>>    lustre:failover.node=10.0.0.22@tcp
>>    lustre:failover.node=10.0.0.23@tcp
>>    lustre:sys.timeout=5000
>>    lustre:mgsnode=10.0.0.22@tcp
>>    lustre:mgsnode=10.0.0.23@tcp
>>
>> 3. mdt1:
>> # zpool create -f -o ashift=12 -O canmount=off lustre-mdt1 mirror
>> /dev/disk/by-id/wwn-0x50000c0f01d113e0
>> /dev/disk/by-id/wwn-0x50000c0f01d116fc
>> # mkfs.lustre --mdt --fsname=fs0 --servicenode=mds2@tcp0
>> --servicenode=mds1@tcp0 --param sys.timeout=5000 --backfstype=zfs
>> --index=1 --mgsnode=mds1@tcp0 --mgsnode=mds2@tcp0  lustre-mdt1/mdt1
>>
>>     Permanent disk data:
>> Target:     fs0:MDT0001
>> Index:      1
>> Lustre FS:  fs0
>> Mount type: zfs
>> Flags:      0x1061
>>                (MDT first_time update no_primnode )
>> Persistent mount opts:
>> Parameters: failover.node=10.0.0.23@tcp failover.node=10.0.0.22@tcp
>> sys.timeout=5000 mgsnode=10.0.0.22@tcp mgsnode=10.0.0.23@tcp
>>
>> checking for existing Lustre data: not found
>> mkfs_cmd = zfs create -o canmount=off -o xattr=sa lustre-mdt1/mdt1
>> Writing lustre-mdt1/mdt1 properties
>>    lustre:version=1
>>    lustre:flags=4193
>>    lustre:index=1
>>    lustre:fsname=fs0
>>    lustre:svname=fs0:MDT0001
>>    lustre:failover.node=10.0.0.23@tcp
>>    lustre:failover.node=10.0.0.22@tcp
>>    lustre:sys.timeout=5000
>>    lustre:mgsnode=10.0.0.22@tcp
>>    lustre:mgsnode=10.0.0.23@tcp
>>
>> a few basic sanity checks:
>> # zfs list
>> NAME               USED  AVAIL  REFER  MOUNTPOINT
>> lustre-mdt0        824K  3.57T   136K  /lustre-mdt0
>> lustre-mdt0/mdt0   136K  3.57T   136K  /lustre-mdt0/mdt0
>> lustre-mdt1        716K  3.57T   136K  /lustre-mdt1
>> lustre-mdt1/mdt1   136K  3.57T   136K  /lustre-mdt1/mdt1
>> lustre-mgs        4.78M  3.57T   136K  /lustre-mgs
>> lustre-mgs/mgs    4.18M  3.57T  4.18M  /lustre-mgs/mgs
>>
>> # zpool list
>> NAME          SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>> lustre-mdt0  3.62T  1.00M  3.62T     0%  1.00x  ONLINE  -
>> lustre-mdt1  3.62T   800K  3.62T     0%  1.00x  ONLINE  -
>> lustre-mgs   3.62T  4.86M  3.62T     0%  1.00x  ONLINE  -
>>
>> # zpool status
>>    pool: lustre-mdt0
>>   state: ONLINE
>>    scan: none requested
>> config:
>>
>>      NAME                        STATE     READ WRITE CKSUM
>>      lustre-mdt0                 ONLINE       0     0     0
>>        mirror-0                  ONLINE       0     0     0
>>          wwn-0x50000c0f01d07a34  ONLINE       0     0     0
>>          wwn-0x50000c0f01d110c8  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>>    pool: lustre-mdt1
>>   state: ONLINE
>>    scan: none requested
>> config:
>>
>>      NAME                        STATE     READ WRITE CKSUM
>>      lustre-mdt1                 ONLINE       0     0     0
>>        mirror-0                  ONLINE       0     0     0
>>          wwn-0x50000c0f01d113e0  ONLINE       0     0     0
>>          wwn-0x50000c0f01d116fc  ONLINE       0     0     0
>>
>> errors: No known data errors
>>
>>    pool: lustre-mgs
>>   state: ONLINE
>>    scan: none requested
>> config:
>>
>>      NAME                        STATE     READ WRITE CKSUM
>>      lustre-mgs                  ONLINE       0     0     0
>>        mirror-0                  ONLINE       0     0     0
>>          wwn-0x50000c0f012306fc  ONLINE       0     0     0
>>          wwn-0x50000c0f01233aec  ONLINE       0     0     0
>>
>> errors: No known data errors
>> # zfs get lustre:svname lustre-mgs/mgs
>> NAME            PROPERTY       VALUE          SOURCE
>> lustre-mgs/mgs  lustre:svname  MGS            local
>> # zfs get lustre:svname lustre-mdt0/mdt0
>> NAME              PROPERTY       VALUE          SOURCE
>> lustre-mdt0/mdt0  lustre:svname  fs0:MDT0000    local
>> # zfs get lustre:svname lustre-mdt1/mdt1
>> NAME              PROPERTY       VALUE          SOURCE
>> lustre-mdt1/mdt1  lustre:svname  fs0:MDT0001    local
>>
>> So far, so good.
>> My /etc/ldev.conf:
>> mds1 mds2 MGS zfs:lustre-mgs/mgs
>> mds1 mds2 fs0-MDT0000 zfs:lustre-mdt0/mdt0
>> mds2 mds1 fs0-MDT0001 zfs:lustre-mdt1/mdt1
>>
>> my /etc/modprobe.d/lustre.conf
>> # options lnet networks=tcp0(em1)
>> options lnet ip2nets="tcp0 10.0.0.[22,23]; tcp0 10.0.0.*;"
>> --------------------------------------------------------------------------
>> ---
>>
>> Now, when starting the services, I get strange errors:
>> # service lustre start local
>> Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS
>> Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000
>> mount.lustre: mount lustre-mdt0/mdt0 at /mnt/lustre/local/fs0-MDT0000
>> failed: Input/output error
>> Is the MGS running?
>> # service lustre status local
>> running
>>
>> attached lctl-dk.local01
>>
>> If I run the same command again, I get a different error:
>>
>> # service lustre start local
>> Mounting lustre-mgs/mgs on /mnt/lustre/local/MGS
>> mount.lustre: according to /etc/mtab lustre-mgs/mgs is already mounted
>> on /mnt/lustre/local/MGS
>> Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000
>> mount.lustre: mount lustre-mdt0/mdt0 at /mnt/lustre/local/fs0-MDT0000
>> failed: File exists
>>
>> attached lctl-dk.local02
>>
>> What am I doing wrong?
>> I have tested lnet self-test as well, using the following script:
>> # cat lnet-selftest.sh
>> #!/bin/bash
>> export LST_SESSION=$$
>> lst new_session read/write
>> lst add_group servers 10.0.0.[22,23]@tcp
>> lst add_group readers 10.0.0.[22,23]@tcp
>> lst add_group writers 10.0.0.[22,23]@tcp
>> lst add_batch bulk_rw
>> lst add_test --batch bulk_rw --from readers --to servers \
>> brw read check=simple size=1M
>> lst add_test --batch bulk_rw --from writers --to servers \
>> brw write check=full size=4K
>> # start running
>> lst run bulk_rw
>> # display server stats for 30 seconds
>> lst stat servers & sleep 30; kill $!
>> # tear down
>> lst end_session
>>
>> and it seemed ok
>> # modprobe lnet-selftest && ssh mds2 modprobe lnet-selftest
>> # ./lnet-selftest.sh
>> SESSION: read/write FEATURES: 0 TIMEOUT: 300 FORCE: No
>> 10.0.0.[22,23]@tcp are added to session
>> 10.0.0.[22,23]@tcp are added to session
>> 10.0.0.[22,23]@tcp are added to session
>> Test was added successfully
>> Test was added successfully
>> bulk_rw is running now
>> [LNet Rates of servers]
>> [R] Avg: 19486    RPC/s Min: 19234    RPC/s Max: 19739    RPC/s
>> [W] Avg: 19486    RPC/s Min: 19234    RPC/s Max: 19738    RPC/s
>> [LNet Bandwidth of servers]
>> [R] Avg: 1737.60  MB/s  Min: 1680.70  MB/s  Max: 1794.51  MB/s
>> [W] Avg: 1737.60  MB/s  Min: 1680.70  MB/s  Max: 1794.51  MB/s
>> [LNet Rates of servers]
>> [R] Avg: 19510    RPC/s Min: 19182    RPC/s Max: 19838    RPC/s
>> [W] Avg: 19510    RPC/s Min: 19182    RPC/s Max: 19838    RPC/s
>> [LNet Bandwidth of servers]
>> [R] Avg: 1741.67  MB/s  Min: 1679.51  MB/s  Max: 1803.83  MB/s
>> [W] Avg: 1741.67  MB/s  Min: 1679.51  MB/s  Max: 1803.83  MB/s
>> [LNet Rates of servers]
>> [R] Avg: 19458    RPC/s Min: 19237    RPC/s Max: 19679    RPC/s
>> [W] Avg: 19458    RPC/s Min: 19237    RPC/s Max: 19679    RPC/s
>> [LNet Bandwidth of servers]
>> [R] Avg: 1738.87  MB/s  Min: 1687.28  MB/s  Max: 1790.45  MB/s
>> [W] Avg: 1738.87  MB/s  Min: 1687.28  MB/s  Max: 1790.45  MB/s
>> [LNet Rates of servers]
>> [R] Avg: 19587    RPC/s Min: 19293    RPC/s Max: 19880    RPC/s
>> [W] Avg: 19586    RPC/s Min: 19293    RPC/s Max: 19880    RPC/s
>> [LNet Bandwidth of servers]
>> [R] Avg: 1752.62  MB/s  Min: 1695.38  MB/s  Max: 1809.85  MB/s
>> [W] Avg: 1752.62  MB/s  Min: 1695.38  MB/s  Max: 1809.85  MB/s
>> [LNet Rates of servers]
>> [R] Avg: 19528    RPC/s Min: 19232    RPC/s Max: 19823    RPC/s
>> [W] Avg: 19528    RPC/s Min: 19232    RPC/s Max: 19824    RPC/s
>> [LNet Bandwidth of servers]
>> [R] Avg: 1741.63  MB/s  Min: 1682.29  MB/s  Max: 1800.98  MB/s
>> [W] Avg: 1741.63  MB/s  Min: 1682.29  MB/s  Max: 1800.98  MB/s
>> session is ended
>> ./lnet-selftest.sh: line 17:  8835 Terminated              lst stat
>> servers
>>
>>
>> Addendum - I can start the MGS service on the 2nd node, and then start
>> mdt0 service on local node:
>> # ssh mds2 service lustre start MGS
>> Mounting lustre-mgs/mgs on /mnt/lustre/foreign/MGS
>> # service lustre start fs0-MDT0000
>> Mounting lustre-mdt0/mdt0 on /mnt/lustre/local/fs0-MDT0000
>> # service lustre status
>> unhealthy
>> # service lustre status local
>> running
>>
>
> Cheers, Andreas

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Setting up a lustre zfs dual mgs/mdt over tcp - help requested

Reply via email to