Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN

Giovanni Bracco Fri, 05 Jun 2020 06:54:27 -0700

answer in the text

On 05/06/20 14:58, Jan-Frode Myklebust wrote:

Could maybe be interesting to drop the NSD servers, and let all nodesaccess the storage via srp ?

no we can not: the production clusters fabric is a mix of a QDR basedcluster and a OPA based cluster and NSD nodes provide the service to both.

Maybe turn off readahead, since it can cause performance degradationwhen GPFS reads 1 MB blocks scattered on the NSDs, so that read-aheadalways reads too much. This might be the cause of the slow read seen —maybe you’ll also overflow it if reading from both NSD-servers at thesame time?

I have switched the readahead off and this produced a small (~10%)increase of performances when reading from a NSD server, but no changein the bad behaviour for the GPFS clients

Plus.. it’s always nice to give a bit more pagepool to hhe clients thanthe default.. I would prefer to start with 4 GB.


we'll do also that and we'll let you know!

Giovanni

-jf

fre. 5. jun. 2020 kl. 14:22 skrev Giovanni Bracco<giovanni.bra...@enea.it <mailto:giovanni.bra...@enea.it>>:


    In our lab we have received two storage-servers, Super micro
    SSG-6049P-E1CR24L, 24 HD each (9TB SAS3), with Avago 3108 RAID
    controller (2 GB cache) and before putting them in production for other
    purposes we have setup a small GPFS test cluster to verify if they can
    be used as storage (our gpfs production cluster has the licenses based
    on the NSD sockets, so it would be interesting to expand the storage
    size just by adding storage-servers in a infiniband based SAN, without
    changing the number of NSD servers)

    The test cluster consists of:

    1) two NSD servers (IBM x3550M2) with a dual port IB QDR Trues scale
    each.
    2) a Mellanox FDR switch used as a SAN switch
    3) a Truescale QDR switch as GPFS cluster switch
    4) two GPFS clients (Supermicro AMD nodes) one port QDR each.

    All the nodes run CentOS 7.7.

    On each storage-server a RAID 6 volume of 11 disk, 80 TB, has been
    configured and it is exported via infiniband as an iSCSI target so that
    both appear as devices accessed by the srp_daemon on the NSD servers,
    where multipath (not really necessary in this case) has been configured
    for these two LIO-ORG devices.

    GPFS version 5.0.4-0 has been installed and the RDMA has been properly
    configured

    Two NSD disk have been created and a GPFS file system has been
    configured.

    Very simple tests have been performed using lmdd serial write/read.

    1) storage-server local performance: before configuring the RAID6
    volume
    as NSD disk, a local xfs file system was created and lmdd write/read
    performance for 100 GB file was verified to be about 1 GB/s

    2) once the GPFS cluster has been created write/read test have been
    performed directly from one of the NSD server at a time:

    write performance 2 GB/s, read performance 1 GB/s for 100 GB file

    By checking with iostat, it was observed that the I/O in this case
    involved only the NSD server where the test was performed, so when
    writing, the double of base performances was obtained,  while in
    reading
    the same performance as on a local file system, this seems correct.
    Values are stable when the test is repeated.

    3) when the same test is performed from the GPFS clients the lmdd
    result
    for a 100 GB file are:

    write - 900 MB/s and stable, not too bad but half of what is seen from
    the NSD servers.

    read - 30 MB/s to 300 MB/s: very low and unstable values

    No tuning of any kind in all the configuration of the involved system,
    only default values.

    Any suggestion to explain the very bad  read performance from a GPFS
    client?

    Giovanni

    here are the configuration of the virtual drive on the storage-server
    and the file system configuration in GPFS


    Virtual drive
    ==============

    Virtual Drive: 2 (Target Id: 2)
    Name                :
    RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
    Size                : 81.856 TB
    Sector Size         : 512
    Is VD emulated      : Yes
    Parity Size         : 18.190 TB
    State               : Optimal
    Strip Size          : 256 KB
    Number Of Drives    : 11
    Span Depth          : 1
    Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
    Bad BBU
    Current Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if
    Bad BBU
    Default Access Policy: Read/Write
    Current Access Policy: Read/Write
    Disk Cache Policy   : Disabled


    GPFS file system from mmlsfs
    ============================

    mmlsfs vsd_gexp2
    flag                value                    description
    ------------------- ------------------------
    -----------------------------------
       -f                 8192                     Minimum fragment
    (subblock) size in bytes
       -i                 4096                     Inode size in bytes
       -I                 32768                    Indirect block size
    in bytes
       -m                 1                        Default number of
    metadata
    replicas
       -M                 2                        Maximum number of
    metadata
    replicas
       -r                 1                        Default number of data
    replicas
       -R                 2                        Maximum number of data
    replicas
       -j                 cluster                  Block allocation type
       -D                 nfs4                     File locking
    semantics in
    effect
       -k                 all                      ACL semantics in effect
       -n                 512                      Estimated number of
    nodes
    that will mount file system
       -B                 1048576                  Block size
       -Q                 user;group;fileset       Quotas accounting enabled
                          user;group;fileset       Quotas enforced
                          none                     Default quotas enabled
       --perfileset-quota No                       Per-fileset quota
    enforcement
       --filesetdf        No                       Fileset df enabled?
       -V                 22.00 (5.0.4.0)          File system version
       --create-time      Fri Apr  3 19:26:27 2020 File system creation time
       -z                 No                       Is DMAPI enabled?
       -L                 33554432                 Logfile size
       -E                 Yes                      Exact mtime mount option
       -S                 relatime                 Suppress atime mount
    option
       -K                 whenpossible             Strict replica
    allocation
    option
       --fastea           Yes                      Fast external attributes
    enabled?
       --encryption       No                       Encryption enabled?
       --inode-limit      134217728                Maximum number of inodes
       --log-replicas     0                        Number of log replicas
       --is4KAligned      Yes                      is4KAligned?
       --rapid-repair     Yes                      rapidRepair enabled?
       --write-cache-threshold 0                   HAWC Threshold (max
    65536)
       --subblocks-per-full-block 128              Number of subblocks per
    full block
       -P                 system                   Disk storage pools in
    file
    system
       --file-audit-log   No                       File Audit Logging
    enabled?
       --maintenance-mode No                       Maintenance Mode enabled?
       -d                 nsdfs4lun2;nsdfs5lun2    Disks in file system
       -A                 yes                      Automatic mount option
       -o                 none                     Additional mount options
       -T                 /gexp2                   Default mount point
       --mount-priority   0                        Mount priority

--Giovanni Bracco

    phone  +39 351 8804788
    E-mail giovanni.bra...@enea.it <mailto:giovanni.bra...@enea.it>
    WWW http://www.afs.enea.it/bracco


    ==================================================

    Questo messaggio e i suoi allegati sono indirizzati esclusivamente
    alle persone indicate e la casella di posta elettronica da cui e'
    stata inviata e' da qualificarsi quale strumento aziendale.
    La diffusione, copia o qualsiasi altra azione derivante dalla
    conoscenza di queste informazioni sono rigorosamente vietate (art.
    616 c.p, D.Lgs. n. 196/2003 s.m.i. e GDPR Regolamento - UE 2016/679).
    Qualora abbiate ricevuto questo documento per errore siete
    cortesemente pregati di darne immediata comunicazione al mittente e
    di provvedere alla sua distruzione. Grazie.

    This e-mail and any attachments is confidential and may contain
    privileged information intended for the addressee(s) only.
    Dissemination, copying, printing or use by anybody else is
    unauthorised (art. 616 c.p, D.Lgs. n. 196/2003 and subsequent
    amendments and GDPR UE 2016/679).
    If you are not the intended recipient, please delete this message
    and any attachments and advise the sender by return e-mail. Thanks.

    ==================================================

    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Giovanni Bracco
phone  +39 351 8804788
E-mail  giovanni.bra...@enea.it
WWW http://www.afs.enea.it/bracco
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN

Reply via email to