Thank you very much, Nate. It's works.
I setup the "max_sectors" parameter to "4096": # cat /etc/modprobe.d/mpt2sas.conf options mpt2sas max_sectors=4096 And the bonnie++ tests were sucessfully executed. Regards, Angelo 2016-03-28 19:55 GMT-03:00 Nate Pearlstein <[email protected]>: > I thought I responded to the entire list but only sent to Angelo, > > Very likely, lustre on the oss nodes is setting the max_sectors_kb all the > way up to max_hw_sectors_kb and this value ends up being too large for the > sas hca. You should set max_sectors for you mpt2sas to something smaller > like 4096, rebuild the initrd and this will put a better limit on > max_hw_sectors_kb for the is5600 luns… > > > > On Mar 28, 2016, at 6:51 PM, Dilger, Andreas <[email protected]> > wrote: > > > > On 2016/03/28, 08:01, "lustre-discuss on behalf of Angelo Cavalcanti" < > [email protected]<mailto: > [email protected]> on behalf of [email protected] > <mailto:[email protected]>> wrote: > > > > > > Dear all, > > > > We're having trouble with a lustre 2.5.3 implementation. This is our > setup: > > > > > > * One server for MGS/MDS/MDT. MDT is served from a raid-6 backed > partition of 2TB (que tipo de hd?) > > > > Note that using RAID-6 for the MDT storage will significantly hurt your > metadata > > performance, since this will incur a lot of read-modify-write overhead > when doing > > 4KB metadata block updates. > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Lustre Principal Architect > > Intel High Performance Data Division > > > > > > * Two OSS/OST in a active/active HA with pacemaker. Both are > connected to a storage via SAS. > > > > > > * One SGI Infinite Storage IS5600 with two raid-6 backed volume > groups. Each group has two volumes, each volume has 15TB capacity. > > > > > > Volumes are recognized by OSSs as multipath devices, each voulme has 4 > paths. Volumes were created with a GPT partition table and a single > partition. > > > > > > Volume partitions were then formatted as OSTs with the following command: > > > > > > # mkfs.lustre --replace --reformat --ost --mkfsoptions=" -E > stride=128,stripe_width=1024" > --mountfsoptions="errors=remount-ro,extents,mballoc" --fsname=lustre1 > --mgsnode=10.149.0.153@o2ib1 --index=0 --servicenode=10.149.0.151@o2ib1 > --servicenode=10.149.0.152@o2ib1 > /dev/mapper/360080e500029eaec0000012656951fcap1 > > > > > > Testing with bonnie++ in a client with the below command: > > > > $ ./bonnie++-1.03e/bonnie++ -m lustre1 -d /mnt/lustre -s 128G:1024k -n 0 > -f -b -u vhpc > > > > > > No problem creating files inside the lustre mount point, but *rewriting* > the same files results in the errors below: > > > > > > Mar 18 17:46:13 oss01 multipathd: 8:128: mark as failed > > > > Mar 18 17:46:13 oss01 multipathd: 360080e500029eaec0000012656951fca: > remaining active paths: 3 > > > > Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code > > > > Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] Result: > hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK > > > > Mar 18 17:46:13 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 > 06 d8 22 00 20 00 00 > > > > Mar 18 17:46:13 oss01 kernel: __ratelimit: 109 callbacks suppressed > > > > Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path > 8:128. > > > > Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code > > > > Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result: > hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK > > > > Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 > 07 18 22 00 18 00 00 > > > > Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path > 8:192. > > > > Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Unhandled error code > > > > Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] Result: > hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK > > > > Mar 18 17:46:13 oss01 kernel: sd 1:0:1:0: [sdm] CDB: Read(10): 28 00 00 > 06 d8 22 00 20 00 00 > > > > Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Unhandled error code > > > > Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] Result: > hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK > > > > Mar 18 17:46:13 oss01 kernel: sd 0:0:1:0: [sde] CDB: Read(10): 28 00 00 > 07 18 22 00 18 00 00 > > > > Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path > 8:64. > > > > Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code > > > > Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result: > hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK > > > > Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 > 07 18 22 00 18 00 00 > > > > Mar 18 17:46:13 oss01 kernel: device-mapper: multipath: Failing path 8:0. > > > > Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Unhandled error code > > > > Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] Result: > hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK > > > > Mar 18 17:46:13 oss01 kernel: sd 0:0:0:0: [sda] CDB: Read(10): 28 00 00 > 06 d8 22 00 20 00 00 > > > > Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec0000012656951fca: sdi > - rdac checker reports path is up > > > > Mar 18 17:46:14 oss01 multipathd: 8:128: reinstated > > > > Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec0000012656951fca: > remaining active paths: 4 > > > > Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code > > > > Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Result: > hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK > > > > Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 > 07 18 22 00 18 00 00 > > > > Mar 18 17:46:14 oss01 kernel: device-mapper: multipath: Failing path > 8:128. > > > > Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Unhandled error code > > > > Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] Result: > hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK > > > > Mar 18 17:46:14 oss01 kernel: sd 1:0:0:0: [sdi] CDB: Read(10): 28 00 00 > 06 d8 22 00 20 00 00 > > > > Mar 18 17:46:14 oss01 multipathd: 8:128: mark as failed > > > > Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec0000012656951fca: > remaining active paths: 3 > > > > Mar 18 17:46:14 oss01 multipathd: 8:192: mark as failed > > > > Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec0000012656951fca: > remaining active paths: 2 > > > > Mar 18 17:46:14 oss01 multipathd: 8:0: mark as failed > > > > Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec0000012656951fca: > remaining active paths: 1 > > > > Mar 18 17:46:14 oss01 multipathd: 8:64: mark as failed > > > > Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec0000012656951fca: > Entering recovery mode: max_retries=30 > > > > Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec0000012656951fca: > remaining active paths: 0 > > > > Mar 18 17:46:14 oss01 multipathd: 360080e500029eaec0000012656951fca: > Entering recovery mode: max_retries=30 > > > > Mar 18 17:46:19 oss01 multipathd: 360080e500029eaec0000012656951fca: sdi > - rdac checker reports path is up > > > > > > Multipath configuration ( /etc/multipath.conf ) is below, and is correct > according to the vendor (SGI). > > > > > > defaults { > > > > user_friendly_names no > > > > } > > > > > > blacklist { > > > > wwid "*" > > > > } > > > > > > blacklist_exceptions { > > > > wwid "360080e500029eaec0000012656951fca" > > > > wwid "360080e500029eaec0000012956951fcb" > > > > wwid "360080e500029eaec0000012c56951fcb" > > > > wwid "360080e500029eaec0000012f56951fcb" > > > > } > > > > > > devices { > > > > device { > > > > vendor "SGI" > > > > product "IS.*" > > > > product_blacklist "Universal Xport" > > > > getuid_callout "/lib/udev/scsi_id --whitelisted > --device=/dev/%n" > > > > prio "rdac" > > > > features "2 pg_init_retries 50" > > > > hardware_handler "1 rdac" > > > > path_grouping_policy "group_by_prio" > > > > failback "immediate" > > > > rr_weight "uniform" > > > > no_path_retry 30 > > > > retain_attached_hw_handler "yes" > > > > detect_prio "yes" > > > > #rr_min_io 1000 > > > > path_checker "rdac" > > > > #selector "round-robin 0" > > > > #polling_interval 10 > > > > } > > > > } > > > > > > > > multipaths { > > > > multipath { > > > > wwid "360080e500029eaec0000012656951fca" > > > > } > > > > multipath { > > > > wwid "360080e500029eaec0000012956951fcb" > > > > } > > > > multipath { > > > > wwid "360080e500029eaec0000012c56951fcb" > > > > } > > > > multipath { > > > > wwid "360080e500029eaec0000012f56951fcb" > > > > } > > > > } > > > > > > Many many combinations of OST formating options were tried, internal and > external journaling … But the same errors persist. > > > > > > The same bonnie++ tests were repeated on all volumes of the storage > using only ext4, all successful. > > > > > > Regards, > > > > Angelo > > _______________________________________________ > > lustre-discuss mailing list > > [email protected] > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > >
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
