While that point (block size should be an integer multiple of the RAID stripe width) is a good one, its violation would explain slow writes, but Giovanni talks of slow reads ...
Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist Global Technology Services / Project Services Delivery / High Performance Computing +49 175 575 2877 Mobile Rathausstr. 7, 09111 Chemnitz, Germany uwefa...@de.ibm.com IBM Services IBM Data Privacy Statement IBM Deutschland Business & Technology Services GmbH Geschäftsführung: Dr. Thomas Wolter, Sven Schooss Sitz der Gesellschaft: Ehningen Registergericht: Amtsgericht Stuttgart, HRB 17122 From: "Luis Bolinches" <luis.bolinc...@fi.ibm.com> To: "Giovanni Bracco" <giovanni.bra...@enea.it> Cc: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>, agostino.fu...@enea.it Date: 11/06/2020 16:11 Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance in simple spectrum scale/gpfs cluster with a storage-server SAN Sent by: gpfsug-discuss-boun...@spectrumscale.org 8 data * 256K does not align to your 1MB Raid 6 is already not the best option for writes. I would look into use multiples of 2MB block sizes. -- Cheers > On 11. Jun 2020, at 17.07, Giovanni Bracco <giovanni.bra...@enea.it> wrote: > > 256K > > Giovanni > >> On 11/06/20 10:01, Luis Bolinches wrote: >> On that RAID 6 what is the logical RAID block size? 128K, 256K, other? >> -- >> Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations >> / Salutacions >> Luis Bolinches >> Consultant IT Specialist >> IBM Spectrum Scale development >> ESS & client adoption teams >> Mobile Phone: +358503112585 >> *https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youracclaim.com_user_luis-2Dbolinches-2A&d=DwIDaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=1mZ896psa5caYzBeaugTlc7TtRejJp3uvKYxas3S7Xc&m=_W83R8yjwX9boyrXDzvfuHOE2zMl1Ggo4JBio7nGUKk&s=0sBbPyJrNuU4BjRb4Cv2f8Z0ot7MiVpqshdkyAHqiuE&e= >> Ab IBM Finland Oy >> Laajalahdentie 23 >> 00330 Helsinki >> Uusimaa - Finland >> >> *"If you always give you will always have" -- Anonymous* >> >> ----- Original message ----- >> From: Giovanni Bracco <giovanni.bra...@enea.it> >> Sent by: gpfsug-discuss-boun...@spectrumscale.org >> To: Jan-Frode Myklebust <janfr...@tanso.net>, gpfsug main discussion >> list <gpfsug-discuss@spectrumscale.org> >> Cc: Agostino Funel <agostino.fu...@enea.it> >> Subject: [EXTERNAL] Re: [gpfsug-discuss] very low read performance >> in simple spectrum scale/gpfs cluster with a storage-server SAN >> Date: Thu, Jun 11, 2020 10:53 >> Comments and updates in the text: >> >>> On 05/06/20 19:02, Jan-Frode Myklebust wrote: >>> fre. 5. jun. 2020 kl. 15:53 skrev Giovanni Bracco >>> <giovanni.bra...@enea.it <mailto:giovanni.bra...@enea.it>>: >>> >>> answer in the text >>> >>>> On 05/06/20 14:58, Jan-Frode Myklebust wrote: >>> > >>> > Could maybe be interesting to drop the NSD servers, and >> let all >>> nodes >>> > access the storage via srp ? >>> >>> no we can not: the production clusters fabric is a mix of a >> QDR based >>> cluster and a OPA based cluster and NSD nodes provide the >> service to >>> both. >>> >>> >>> You could potentially still do SRP from QDR nodes, and via NSD >> for your >>> omnipath nodes. Going via NSD seems like a bit pointless indirection. >> >> not really: both clusters, the 400 OPA nodes and the 300 QDR nodes share >> the same data lake in Spectrum Scale/GPFS so the NSD servers support the >> flexibility of the setup. >> >> NSD servers make use of a IB SAN fabric (Mellanox FDR switch) where at >> the moment 3 different generations of DDN storages are connected, >> 9900/QDR 7700/FDR and 7990/EDR. The idea was to be able to add some less >> expensive storage, to be used when performance is not the first >> priority. >> >>> >>> >>> >>> > >>> > Maybe turn off readahead, since it can cause performance >> degradation >>> > when GPFS reads 1 MB blocks scattered on the NSDs, so that >>> read-ahead >>> > always reads too much. This might be the cause of the slow >> read >>> seen ? >>> > maybe you?ll also overflow it if reading from both >> NSD-servers at >>> the >>> > same time? >>> >>> I have switched the readahead off and this produced a small >> (~10%) >>> increase of performances when reading from a NSD server, but >> no change >>> in the bad behaviour for the GPFS clients >>> >>> >>> > >>> > >>> > Plus.. it?s always nice to give a bit more pagepool to hhe >>> clients than >>> > the default.. I would prefer to start with 4 GB. >>> >>> we'll do also that and we'll let you know! >>> >>> >>> Could you show your mmlsconfig? Likely you should set maxMBpS to >>> indicate what kind of throughput a client can do (affects GPFS >>> readahead/writebehind). Would typically also increase >> workerThreads on >>> your NSD servers. >> >> At this moment this is the output of mmlsconfig >> >> # mmlsconfig >> Configuration data for cluster GPFSEXP.portici.enea.it: >> ------------------------------------------------------- >> clusterName GPFSEXP.portici.enea.it >> clusterId 13274694257874519577 >> autoload no >> dmapiFileHandleSize 32 >> minReleaseLevel 5.0.4.0 >> ccrEnabled yes >> cipherList AUTHONLY >> verbsRdma enable >> verbsPorts qib0/1 >> [cresco-gpfq7,cresco-gpfq8] >> verbsPorts qib0/2 >> [common] >> pagepool 4G >> adminMode central >> >> File systems in cluster GPFSEXP.portici.enea.it: >> ------------------------------------------------ >> /dev/vsd_gexp2 >> /dev/vsd_gexp3 >> >> >>> >>> >>> 1 MB blocksize is a bit bad for your 9+p+q RAID with 256 KB strip >> size. >>> When you write one GPFS block, less than a half RAID stripe is >> written, >>> which means you need to read back some data to calculate new >> parities. >>> I would prefer 4 MB block size, and maybe also change to 8+p+q so >> that >>> one GPFS is a multiple of a full 2 MB stripe. >>> >>> >>> -jf >> >> we have now added another file system based on 2 NSD on RAID6 8+p+q, >> keeping the 1MB block size just not to change too many things at the >> same time, but no substantial change in very low readout performances, >> that are still of the order of 50 MB/s while write performance are >> 1000MB/s >> >> Any other suggestion is welcomed! >> >> Giovanni >> >> >> >> -- >> Giovanni Bracco >> phone +39 351 8804788 >> E-mail giovanni.bra...@enea.it >> WWW http://www.afs.enea.it/bracco >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> >> Ellei edellä ole toisin mainittu: / Unless stated otherwise above: >> Oy IBM Finland Ab >> PL 265, 00101 Helsinki, Finland >> Business ID, Y-tunnus: 0195876-3 >> Registered in Finland >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> > > -- > Giovanni Bracco > phone +39 351 8804788 > E-mail giovanni.bra...@enea.it > WWW http://www.afs.enea.it/bracco > Ellei edellä ole toisin mainittu: / Unless stated otherwise above: Oy IBM Finland Ab PL 265, 00101 Helsinki, Finland Business ID, Y-tunnus: 0195876-3 Registered in Finland _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=fTuVGtgq6A14KiNeaGfNZzOOgtHW5Lm4crZU6lJxtB8&m=CPBLf7s53vCFL0esHIl8ZkeC7BiuNZUHD6JVWkcy48c&s=wfe9UKg6bKylrLyuepv2J4jNN4BEfLQK6A46yX9IB-Q&e= _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss