Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN

Uwe Falke Thu, 11 Jun 2020 13:43:25 -0700

While that point (block size should be an integer multiple of the RAID 
stripe width) is a good one, its violation would explain slow writes, but 
Giovanni talks of slow reads ...


Mit freundlichen Grüßen / Kind regards

Dr. Uwe Falke
IT Specialist
Global Technology Services / Project Services Delivery / High Performance 
Computing
+49 175 575 2877 Mobile
Rathausstr. 7, 09111 Chemnitz, Germany
uwefa...@de.ibm.com

IBM Services

IBM Data Privacy Statement

IBM Deutschland Business & Technology Services GmbH
Geschäftsführung: Dr. Thomas Wolter, Sven Schooss
Sitz der Gesellschaft: Ehningen
Registergericht: Amtsgericht Stuttgart, HRB 17122



From:   "Luis Bolinches" <luis.bolinc...@fi.ibm.com>
To:     "Giovanni Bracco" <giovanni.bra...@enea.it>
Cc:     gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>, 
agostino.fu...@enea.it
Date:   11/06/2020 16:11
Subject:        [EXTERNAL] Re: [gpfsug-discuss] very low read performance 
in simple spectrum scale/gpfs cluster with a storage-server SAN
Sent by:        gpfsug-discuss-boun...@spectrumscale.org



8 data * 256K does not align to your 1MB 

Raid 6 is already not the best option for writes. I would look into use 
multiples of 2MB block sizes. 

--
Cheers

> On 11. Jun 2020, at 17.07, Giovanni Bracco <giovanni.bra...@enea.it> 
wrote:
> 
> 256K
> 
> Giovanni
> 
>> On 11/06/20 10:01, Luis Bolinches wrote:
>> On that RAID 6 what is the logical RAID block size? 128K, 256K, other?
>> --
>> Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations 

>> / Salutacions
>> Luis Bolinches
>> Consultant IT Specialist
>> IBM Spectrum Scale development
>> ESS & client adoption teams
>> Mobile Phone: +358503112585
>> 
*https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youracclaim.com_user_luis-2Dbolinches-2A&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=0sBbPyJrNuU4BjRb4Cv2f8Z0ot7MiVpqshdkyAHqiuE&e=
 

>> Ab IBM Finland Oy
>> Laajalahdentie 23
>> 00330 Helsinki
>> Uusimaa - Finland
>> 
>> *"If you always give you will always have" -- Anonymous*
>> 
>> ----- Original message -----
>> From: Giovanni Bracco <giovanni.bra...@enea.it>
>> Sent by: gpfsug-discuss-boun...@spectrumscale.org
>> To: Jan-Frode Myklebust <janfr...@tanso.net>, gpfsug main discussion
>> list <gpfsug-discuss@spectrumscale.org>
>> Cc: Agostino Funel <agostino.fu...@enea.it>
>> Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance
>> in simple spectrum scale/gpfs cluster with a storage-server SAN
>> Date: Thu, Jun 11, 2020 10:53
>> Comments and updates in the text:
>> 
>>> On 05/06/20 19:02, Jan-Frode Myklebust wrote:
>>> fre. 5. jun. 2020 kl. 15:53 skrev Giovanni Bracco
>>> <giovanni.bra...@enea.it <mailto:giovanni.bra...@enea.it>>:
>>> 
>>> answer in the text
>>> 
>>>> On 05/06/20 14:58, Jan-Frode Myklebust wrote:
>>> >
>>> > Could maybe be interesting to drop the NSD servers, and
>> let all
>>> nodes
>>> > access the storage via srp ?
>>> 
>>> no we can not: the production clusters fabric is a mix of a
>> QDR based
>>> cluster and a OPA based cluster and NSD nodes provide the
>> service to
>>> both.
>>> 
>>> 
>>> You could potentially still do SRP from QDR nodes, and via NSD
>> for your
>>> omnipath nodes. Going via NSD seems like a bit pointless indirection.
>> 
>> not really: both clusters, the 400 OPA nodes and the 300 QDR nodes 
share
>> the same data lake in Spectrum Scale/GPFS so the NSD servers support 
the
>> flexibility of the setup.
>> 
>> NSD servers make use of a IB SAN fabric (Mellanox FDR switch) where at
>> the moment 3 different generations of DDN storages are connected,
>> 9900/QDR 7700/FDR and 7990/EDR. The idea was to be able to add some 
less
>> expensive storage, to be used when performance is not the first
>> priority.
>> 
>>> 
>>> 
>>> 
>>> >
>>> > Maybe turn off readahead, since it can cause performance
>> degradation
>>> > when GPFS reads 1 MB blocks scattered on the NSDs, so that
>>> read-ahead
>>> > always reads too much. This might be the cause of the slow
>> read
>>> seen ?
>>> > maybe you?ll also overflow it if reading from both
>> NSD-servers at
>>> the
>>> > same time?
>>> 
>>> I have switched the readahead off and this produced a small
>> (~10%)
>>> increase of performances when reading from a NSD server, but
>> no change
>>> in the bad behaviour for the GPFS clients
>>> 
>>> 
>>> >
>>> >
>>> > Plus.. it?s always nice to give a bit more pagepool to hhe
>>> clients than
>>> > the default.. I would prefer to start with 4 GB.
>>> 
>>> we'll do also that and we'll let you know!
>>> 
>>> 
>>> Could you show your mmlsconfig? Likely you should set maxMBpS to
>>> indicate what kind of throughput a client can do (affects GPFS
>>> readahead/writebehind). Would typically also increase
>> workerThreads on
>>> your NSD servers.
>> 
>> At this moment this is the output of mmlsconfig
>> 
>> # mmlsconfig
>> Configuration data for cluster GPFSEXP.portici.enea.it:
>> -------------------------------------------------------
>> clusterName GPFSEXP.portici.enea.it
>> clusterId 13274694257874519577
>> autoload no
>> dmapiFileHandleSize 32
>> minReleaseLevel 5.0.4.0
>> ccrEnabled yes
>> cipherList AUTHONLY
>> verbsRdma enable
>> verbsPorts qib0/1
>> [cresco-gpfq7,cresco-gpfq8]
>> verbsPorts qib0/2
>> [common]
>> pagepool 4G
>> adminMode central
>> 
>> File systems in cluster GPFSEXP.portici.enea.it:
>> ------------------------------------------------
>> /dev/vsd_gexp2
>> /dev/vsd_gexp3
>> 
>> 
>>> 
>>> 
>>> 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip
>> size.
>>> When you write one GPFS block, less than a half RAID stripe is
>> written,
>>> which means you need to read back some data to calculate new
>> parities.
>>> I would prefer 4 MB block size, and maybe also change to 8+p+q so
>> that
>>> one GPFS is a multiple of a full 2 MB stripe.
>>> 
>>> 
>>> -jf
>> 
>> we have now added another file system based on 2 NSD on RAID6 8+p+q,
>> keeping the 1MB block size just not to change too many things at the
>> same time, but no substantial change in very low readout performances,
>> that are still of the order of 50 MB/s while write performance are
>> 1000MB/s
>> 
>> Any other suggestion is welcomed!
>> 
>> Giovanni
>> 
>> 
>> 
>> --
>> Giovanni Bracco
>> phone +39 351 8804788
>> E-mail giovanni.bra...@enea.it
>> WWW http://www.afs.enea.it/bracco 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
>> 
>> 
>> Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
>> Oy IBM Finland Ab
>> PL 265, 00101 Helsinki, Finland
>> Business ID, Y-tunnus: 0195876-3
>> Registered in Finland
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss 
>> 
> 
> -- 
> Giovanni Bracco
> phone +39 351 8804788
> E-mail giovanni.bra...@enea.it
> WWW http://www.afs.enea.it/bracco 
> 

Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
Oy IBM Finland Ab
PL 265, 00101 Helsinki, Finland
Business ID, Y-tunnus: 0195876-3 
Registered in Finland
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=fTuVGtgq6A14KiNeaGfNZzOOgtHW5Lm4crZU6lJxtB8&m=CPBLf7s53vCFL0esHIl8ZkeC7BiuNZUHD6JVWkcy48c&s=wfe9UKg6bKylrLyuepv2J4jNN4BEfLQK6A46yX9IB-Q&e=
 





_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN

Reply via email to