Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN

Giovanni Bracco Thu, 11 Jun 2020 07:07:01 -0700

256K

Giovanni


On 11/06/20 10:01, Luis Bolinches wrote:

On that RAID 6 what is the logical RAID block size? 128K, 256K, other?
--

Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations/ Salutacions

Luis Bolinches
Consultant IT Specialist
IBM Spectrum Scale development
ESS & client adoption teams
Mobile Phone: +358503112585
*https://www.youracclaim.com/user/luis-bolinches*
Ab IBM Finland Oy
Laajalahdentie 23
00330 Helsinki
Uusimaa - Finland

*"If you always give you will always have" --  Anonymous*

    ----- Original message -----
    From: Giovanni Bracco <giovanni.bra...@enea.it>
    Sent by: gpfsug-discuss-boun...@spectrumscale.org
    To: Jan-Frode Myklebust <janfr...@tanso.net>, gpfsug main discussion
    list <gpfsug-discuss@spectrumscale.org>
    Cc: Agostino Funel <agostino.fu...@enea.it>
    Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance
    in simple spectrum scale/gpfs cluster with a storage-server SAN
    Date: Thu, Jun 11, 2020 10:53
    Comments and updates in the text:

    On 05/06/20 19:02, Jan-Frode Myklebust wrote:
     > fre. 5. jun. 2020 kl. 15:53 skrev Giovanni Bracco
     > <giovanni.bra...@enea.it <mailto:giovanni.bra...@enea.it>>:
     >
     >     answer in the text
     >
     >     On 05/06/20 14:58, Jan-Frode Myklebust wrote:
     >      >
     >      > Could maybe be interesting to drop the NSD servers, and
    let all
     >     nodes
     >      > access the storage via srp ?
     >
     >     no we can not: the production clusters fabric is a mix of a
    QDR based
     >     cluster and a OPA based cluster and NSD nodes provide the
    service to
     >     both.
     >
     >
     > You could potentially still do SRP from QDR nodes, and via NSD
    for your
     > omnipath nodes. Going via NSD seems like a bit pointless indirection.

    not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share
    the same data lake in Spectrum Scale/GPFS so the NSD servers support the
    flexibility of the setup.

    NSD servers make use of a IB SAN fabric (Mellanox FDR switch) where at
    the moment 3 different generations of DDN storages are connected,
    9900/QDR 7700/FDR and 7990/EDR. The idea was to be able to add some less
    expensive storage, to be used when performance is not the first
    priority.

     >
     >
     >
     >      >
     >      > Maybe turn off readahead, since it can cause performance
    degradation
     >      > when GPFS reads 1 MB blocks scattered on the NSDs, so that
     >     read-ahead
     >      > always reads too much. This might be the cause of the slow
    read
     >     seen —
     >      > maybe you’ll also overflow it if reading from both
    NSD-servers at
     >     the
     >      > same time?
     >
     >     I have switched the readahead off and this produced a small
    (~10%)
     >     increase of performances when reading from a NSD server, but
    no change
     >     in the bad behaviour for the GPFS clients
     >
     >
     >      >
     >      >
     >      > Plus.. it’s always nice to give a bit more pagepool to hhe
     >     clients than
     >      > the default.. I would prefer to start with 4 GB.
     >
     >     we'll do also that and we'll let you know!
     >
     >
     > Could you show your mmlsconfig? Likely you should set maxMBpS to
     > indicate what kind of throughput a client can do (affects GPFS
     > readahead/writebehind).  Would typically also increase
    workerThreads on
     > your NSD servers.

    At this moment this is the output of mmlsconfig

    # mmlsconfig
    Configuration data for cluster GPFSEXP.portici.enea.it:
    -------------------------------------------------------
    clusterName GPFSEXP.portici.enea.it
    clusterId 13274694257874519577
    autoload no
    dmapiFileHandleSize 32
    minReleaseLevel 5.0.4.0
    ccrEnabled yes
    cipherList AUTHONLY
    verbsRdma enable
    verbsPorts qib0/1
    [cresco-gpfq7,cresco-gpfq8]
    verbsPorts qib0/2
    [common]
    pagepool 4G
    adminMode central

    File systems in cluster GPFSEXP.portici.enea.it:
    ------------------------------------------------
    /dev/vsd_gexp2
    /dev/vsd_gexp3


     >
     >
     > 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip
    size.
     > When you write one GPFS block, less than a half RAID stripe is
    written,
     > which means you  need to read back some data to calculate new
    parities.
     > I would prefer 4 MB block size, and maybe also change to 8+p+q so
    that
     > one GPFS is a multiple of a full 2 MB stripe.
     >
     >
     >     -jf

    we have now added another file system based on 2 NSD on RAID6 8+p+q,
    keeping the 1MB block size just not to change too many things at the
    same time, but no substantial change in very low readout performances,
    that are still of the order of 50 MB/s while write performance are
    1000MB/s

    Any other suggestion is welcomed!

    Giovanni



    --
    Giovanni Bracco
    phone  +39 351 8804788
    E-mail  giovanni.bra...@enea.it
    WWW http://www.afs.enea.it/bracco
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3
Registered in Finland


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Giovanni Bracco
phone  +39 351 8804788
E-mail  giovanni.bra...@enea.it
WWW http://www.afs.enea.it/bracco
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN

Reply via email to