answer in the text
On 05/06/20 14:58, Jan-Frode Myklebust wrote:
Could maybe be interesting to drop the NSD servers, and let all nodes
access the storage via srp ?
no we can not: the production clusters fabric is a mix of a QDR based
cluster and a OPA based cluster and NSD nodes provide the service to both.
Maybe turn off readahead, since it can cause performance degradation
when GPFS reads 1 MB blocks scattered on the NSDs, so that read-ahead
always reads too much. This might be the cause of the slow read seen —
maybe you’ll also overflow it if reading from both NSD-servers at the
same time?
I have switched the readahead off and this produced a small (~10%)
increase of performances when reading from a NSD server, but no change
in the bad behaviour for the GPFS clients
Plus.. it’s always nice to give a bit more pagepool to hhe clients than
the default.. I would prefer to start with 4 GB.
we'll do also that and we'll let you know!
Giovanni
-jf
fre. 5. jun. 2020 kl. 14:22 skrev Giovanni Bracco
<giovanni.bra...@enea.it <mailto:giovanni.bra...@enea.it>>:
In our lab we have received two storage-servers, Super micro
SSG-6049P-E1CR24L, 24 HD each (9TB SAS3), with Avago 3108 RAID
controller (2 GB cache) and before putting them in production for other
purposes we have setup a small GPFS test cluster to verify if they can
be used as storage (our gpfs production cluster has the licenses based
on the NSD sockets, so it would be interesting to expand the storage
size just by adding storage-servers in a infiniband based SAN, without
changing the number of NSD servers)
The test cluster consists of:
1) two NSD servers (IBM x3550M2) with a dual port IB QDR Trues scale
each.
2) a Mellanox FDR switch used as a SAN switch
3) a Truescale QDR switch as GPFS cluster switch
4) two GPFS clients (Supermicro AMD nodes) one port QDR each.
All the nodes run CentOS 7.7.
On each storage-server a RAID 6 volume of 11 disk, 80 TB, has been
configured and it is exported via infiniband as an iSCSI target so that
both appear as devices accessed by the srp_daemon on the NSD servers,
where multipath (not really necessary in this case) has been configured
for these two LIO-ORG devices.
GPFS version 5.0.4-0 has been installed and the RDMA has been properly
configured
Two NSD disk have been created and a GPFS file system has been
configured.
Very simple tests have been performed using lmdd serial write/read.
1) storage-server local performance: before configuring the RAID6
volume
as NSD disk, a local xfs file system was created and lmdd write/read
performance for 100 GB file was verified to be about 1 GB/s
2) once the GPFS cluster has been created write/read test have been
performed directly from one of the NSD server at a time:
write performance 2 GB/s, read performance 1 GB/s for 100 GB file
By checking with iostat, it was observed that the I/O in this case
involved only the NSD server where the test was performed, so when
writing, the double of base performances was obtained, while in
reading
the same performance as on a local file system, this seems correct.
Values are stable when the test is repeated.
3) when the same test is performed from the GPFS clients the lmdd
result
for a 100 GB file are:
write - 900 MB/s and stable, not too bad but half of what is seen from
the NSD servers.
read - 30 MB/s to 300 MB/s: very low and unstable values
No tuning of any kind in all the configuration of the involved system,
only default values.
Any suggestion to explain the very bad read performance from a GPFS
client?
Giovanni
here are the configuration of the virtual drive on the storage-server
and the file system configuration in GPFS
Virtual drive
==============
Virtual Drive: 2 (Target Id: 2)
Name :
RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3
Size : 81.856 TB
Sector Size : 512
Is VD emulated : Yes
Parity Size : 18.190 TB
State : Optimal
Strip Size : 256 KB
Number Of Drives : 11
Span Depth : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disabled
GPFS file system from mmlsfs
============================
mmlsfs vsd_gexp2
flag value description
------------------- ------------------------
-----------------------------------
-f 8192 Minimum fragment
(subblock) size in bytes
-i 4096 Inode size in bytes
-I 32768 Indirect block size
in bytes
-m 1 Default number of
metadata
replicas
-M 2 Maximum number of
metadata
replicas
-r 1 Default number of data
replicas
-R 2 Maximum number of data
replicas
-j cluster Block allocation type
-D nfs4 File locking
semantics in
effect
-k all ACL semantics in effect
-n 512 Estimated number of
nodes
that will mount file system
-B 1048576 Block size
-Q user;group;fileset Quotas accounting enabled
user;group;fileset Quotas enforced
none Default quotas enabled
--perfileset-quota No Per-fileset quota
enforcement
--filesetdf No Fileset df enabled?
-V 22.00 (5.0.4.0) File system version
--create-time Fri Apr 3 19:26:27 2020 File system creation time
-z No Is DMAPI enabled?
-L 33554432 Logfile size
-E Yes Exact mtime mount option
-S relatime Suppress atime mount
option
-K whenpossible Strict replica
allocation
option
--fastea Yes Fast external attributes
enabled?
--encryption No Encryption enabled?
--inode-limit 134217728 Maximum number of inodes
--log-replicas 0 Number of log replicas
--is4KAligned Yes is4KAligned?
--rapid-repair Yes rapidRepair enabled?
--write-cache-threshold 0 HAWC Threshold (max
65536)
--subblocks-per-full-block 128 Number of subblocks per
full block
-P system Disk storage pools in
file
system
--file-audit-log No File Audit Logging
enabled?
--maintenance-mode No Maintenance Mode enabled?
-d nsdfs4lun2;nsdfs5lun2 Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /gexp2 Default mount point
--mount-priority 0 Mount priority
--
Giovanni Bracco
phone +39 351 8804788
E-mail giovanni.bra...@enea.it <mailto:giovanni.bra...@enea.it>
WWW http://www.afs.enea.it/bracco
==================================================
Questo messaggio e i suoi allegati sono indirizzati esclusivamente
alle persone indicate e la casella di posta elettronica da cui e'
stata inviata e' da qualificarsi quale strumento aziendale.
La diffusione, copia o qualsiasi altra azione derivante dalla
conoscenza di queste informazioni sono rigorosamente vietate (art.
616 c.p, D.Lgs. n. 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679).
Qualora abbiate ricevuto questo documento per errore siete
cortesemente pregati di darne immediata comunicazione al mittente e
di provvedere alla sua distruzione. Grazie.
This e-mail and any attachments is confidential and may contain
privileged information intended for the addressee(s) only.
Dissemination, copying, printing or use by anybody else is
unauthorised (art. 616 c.p, D.Lgs. n. 196/2003 and subsequent
amendments and GDPR UE 2016/679).
If you are not the intended recipient, please delete this message
and any attachments and advise the sender by return e-mail. Thanks.
==================================================
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
--
Giovanni Bracco
phone +39 351 8804788
E-mail giovanni.bra...@enea.it
WWW http://www.afs.enea.it/bracco
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss