It seems expensive (straight mirroring rather than parity’s) and it’s 
asynchronous from Lustre, so if you’re really just syncing the block devices, 
that can’t guarantee safety on failure.  If I understand what you’re doing, 
when a failure occurs, drbd may be in the middle of syncing the block device.  
That would likely lead to losing data you had already written and possibly to 
corrupting the on disk file system in the mirror.  (Specifically, you’d end up 
copying part of something important before the failure occurred)

________________________________
From: yu sun <[email protected]>
Sent: Wednesday, June 27, 2018 11:26:52 PM
To: Patrick Farrell
Cc: [email protected]; [email protected]
Subject: Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

yes, drbd will mirror the content of block devices between hosts synchronously 
or asynchronously. this will provide us data redundancy between hosts.
perhaps we should use zfs + drbd for mdt and ost?

Thanks
Yu

Patrick Farrell <[email protected]<mailto:[email protected]>> 于2018年6月27日周三 下午9:28写道:

I’m a little puzzled - it can switch, but isn’t the data on the failed disk 
lost...?  That’s why Andreas is suggesting RAID.  Or is drbd doing syncing of 
the disk?  That seems like a really expensive way to get redundancy, since it 
would have to be full online mirroring with all the costs in hardware and 
resource usage that implies...?

ZFS is not a requirement, it generally performs a bit worse than ldiskfs but 
makes it up with impressive features to improve data integrity and related 
things.  Since it sounds like that’s not a huge concern for you, I would stick 
with ldiskfs.  It will likely be a little faster and is easier to set up.

________________________________
From: lustre-discuss 
<[email protected]<mailto:[email protected]>>
 on behalf of yu sun <[email protected]<mailto:[email protected]>>
Sent: Wednesday, June 27, 2018 8:21:43 AM
To: [email protected]<mailto:[email protected]>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

yes, you are right, thanks for your great suggestions.

now we are using glusterfs to store training data for ML, and we begin to 
investigate lustre to instead glusterfs for performance.

Firstly, yes we do want to get maximum perforance, you means we should use zfs 
, for example , not each ost/mdt on a separate partitions, for better 
perforance?

Secondly, we dont use any underlying RAID devices,  and we do configure each 
ost on a separate disk, considering that lustre does not provide disk data 
redundancy, we are use drbd + pacemarker + corosync for data redundancy and HA, 
you can see we have configured --servicenode when mkfs.lustre. I dont know how 
reliable is this solution?  it seems ok for our current test, when one disk 
faild, pacemarker can switch to other ost on the other machine automaticly.

we also want to use zfs and I have test zfs by mirror, However, if the physical 
machine down,data on the machine will lost. so we decice use the solution 
listed above.

Now we are testing, and any suggesting is appreciated 😆.
thanks Andreas.

Your
Yu



Andreas Dilger <[email protected]<mailto:[email protected]>> 
于2018年6月27日周三 下午7:07写道:
On Jun 27, 2018, at 09:12, yu sun 
<[email protected]<mailto:[email protected]>> wrote:
>
> client:
> [email protected]:~$ mount -t lustre 
> node28@o2ib1:node29@o2ib1:/project /mnt/lustre_data
> mount.lustre: mount node28@o2ib1:node29@o2ib1:/project at /mnt/lustre_data 
> failed: Input/output error
> Is the MGS running?
> [email protected]:~$ lctl ping node28@o2ib1
> failed to ping 10.82.143.202@o2ib1: Input/output error
> [email protected]:~$
>
>
> mgs and mds:
>     mkfs.lustre --mgs --reformat --servicenode=node28@o2ib1 
> --servicenode=node29@o2ib1 /dev/sdb1
>     mkfs.lustre --fsname=project --mdt --index=0 --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1 --servicenode node28@o2ib1 --servicenode node29@o2ib1 
> --reformat --backfstype=ldiskfs /dev/sdc1

Separate from the LNet issues, it is probably worthwhile to point out some 
issues
with your configuration.  You shouldn't use partitions on the OST and MDT 
devices
if you want to get maximum performance.  That can offset all of the filesystem 
IO
from the RAID/sector alignment and hurt performance.

Secondly, it isn't clear if you are using underlying RAID devices, or if you are
configuring each OST on a separate disk?  It looks like the latter - that you 
are
making each disk a separate OST.  That isn't a good idea for Lustre, since it 
does
not (yet) have any redundancy at higher layers, and any disk failure would 
result
in data loss.  You currently need to have RAID-5/6 or ZFS for each OST/MDT, 
unless
this is a really "scratch" filesystem where you don't care if the data is lost 
and
reformatting the filesystem is OK (i.e. low cost is the primary goal, which is 
fine
also, but not very common).

We are working at Lustre-level data redundancy, and there is some support for 
this
in the 2.11 release, but it is not yet in a state where you could reliably use 
it
to mirror all of the files in the filesystem.

Cheers, Andreas

>
> ost:
> ml-storage-ser22.nmg01:
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 
> --ost --index=12 /dev/sdc1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 
> --ost --index=13 /dev/sdd1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 
> --ost --index=14 /dev/sde1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 
> --ost --index=15 /dev/sdf1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 
> --ost --index=16 /dev/sdg1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 
> --ost --index=17 /dev/sdh1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 
> --ost --index=18 /dev/sdi1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 
> --ost --index=19 /dev/sdj1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 
> --ost --index=20 /dev/sdk1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 
> --ost --index=21 /dev/sdl1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 
> --ost --index=22 /dev/sdm1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node22@o2ib1 --servicenode=node23@o2ib1 
> --ost --index=23 /dev/sdn1
> ml-storage-ser26.nmg01:
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 
> --ost --index=36 /dev/sdc1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 
> --ost --index=37 /dev/sdd1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 
> --ost --index=38 /dev/sde1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 
> --ost --index=39 /dev/sdf1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 
> --ost --index=40 /dev/sdg1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 
> --ost --index=41 /dev/sdh1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 
> --ost --index=42 /dev/sdi1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 
> --ost --index=43 /dev/sdj1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 
> --ost --index=44 /dev/sdk1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 
> --ost --index=45 /dev/sdl1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 
> --ost --index=46 /dev/sdm1
>     mkfs.lustre --fsname=project --reformat --mgsnode=node28@o2ib1 
> --mgsnode=node29@o2ib1  --servicenode=node26@o2ib1 --servicenode=node27@o2ib1 
> --ost --index=47 /dev/sdn1
>
> Thanks
> Yu
>
> Mohr Jr, Richard Frank (Rick Mohr) <[email protected]<mailto:[email protected]>> 
> 于2018年6月27日周三 下午1:25写道:
>
> > On Jun 27, 2018, at 12:52 AM, yu sun 
> > <[email protected]<mailto:[email protected]>> wrote:
> >
> > I have create file /etc/modprobe.d/lustre.conf with content on all mdt ost 
> > and client:
> > [email protected]:~$ cat /etc/modprobe.d/lustre.conf
> > options lnet networks="o2ib1(eth3.2)"
> > and I exec command line : lnetctl lnet configure --all to make my static 
> > lnet configuration take effect. but i still can't ping node28 from my 
> > client ml-gpu-ser200.nmg01.   I can mount  as well as access lustre on  
> > client ml-gpu-ser200.nmg01.
>
> What options did you use when mounting the file system?
>
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
>
> _______________________________________________
> lustre-discuss mailing list
> [email protected]<mailto:[email protected]>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud







_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to