Re: [gpfsug-discuss] IO sizes

Uwe Falke Fri, 25 Feb 2022 06:29:40 -0800

Hi, and thanks, Achim and Olaf,

mmdiag --iohist on the NSD servers (on all 4 of them) shows IO sizes in IOs to/from the data NSDs (i.e. to/from storage) of 16384 512-byte-sectors throughout, i.e. 8MiB, agreeing with the FS block size. (Having that information i do not need to ask the clients ...)

iostat on NSD servers as well as the storage system counters say the IOs crafted by the OS layer are 4MiB except for the one suspicious NSD server where they were somewhat smaller than 4MiB before the reboot, but are now somewhat larger than 4MiB (but by a distinct amount).

The data piped through the NSD servers are well balanced between the 4 NSD servers, the IO system of the suspicious NSD server just issued a higher rate of IO requests when running smaller IOs and now, with larger IOs it has a lower IO rate than the other three NSD servers.

So I am pretty sure it is not GPFS (see my initial post :-); but still some people using GPFS might have encounterd that as well, or might have an idea ;-)


Cheers

Uwe

On 24.02.22 13:47, Olaf Weiser wrote:

in addition, to Achim,
where do you see those "smaller IO"...

have you checked IO sizes with mmfsadm dump iohist on each NSDclient/Server ?... If ok on that level.. it's not GPFS

Mit freundlichen Grüßen / Kind regards

Olaf Weiser

    ----- Ursprüngliche Nachricht -----
    Von: "Achim Rehor" <achim.re...@de.ibm.com>
    Gesendet von: gpfsug-discuss-boun...@spectrumscale.org
    An: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org>
    CC:
    Betreff: [EXTERNAL] Re: [gpfsug-discuss] IO sizes
    Datum: Do, 24. Feb 2022 13:41

    Hi Uwe,

    first of all, glad to see you back in the GPFS space ;)

    agreed, groups of subblocks being written will end up in IO sizes,
    being smaller than the 8MB filesystem blocksize,
    also agreed, this cannot be metadata, since their size is MUCH
    smaller, like 4k or less, mostly.

    But why would these grouped subblock reads/writes all end up on
    the same NSD server, while the others do full block writes ?

    How is your NSD server setup per NSD ? did you 'round-robin' set
    the preferred NSD server per NSD ?
    are the client nodes transferring the data in anyway doing
    specifics  ?

    Sorry for not having a solution for you, jsut sharing a few ideas ;)


    Mit freundlichen Grüßen / Kind regards

    *Achim Rehor*

    Technical Support Specialist Spectrum Scale and ESS (SME)
    Advisory Product Services Professional
    IBM Systems Storage Support - EMEA

                        
                
                


    gpfsug-discuss-boun...@spectrumscale.org wrote on 23/02/2022 22:20:11:

    > From: "Andrew Beattie" <abeat...@au1.ibm.com>
    > To: "gpfsug main discussion list" <gpfsug-discuss@spectrumscale.org>
    > Date: 23/02/2022 22:20
    > Subject: [EXTERNAL] Re: [gpfsug-discuss] IO sizes
    > Sent by: gpfsug-discuss-boun...@spectrumscale.org
    >
    > Alex, Metadata will be 4Kib Depending on the filesystem version you
    > will also have subblocks to consider V4 filesystems have 1/32
    > subblocks, V5 filesystems have 1/1024 subblocks (assuming metadata
    > and data block size is the same)
    ‍‍‍‍‍‍‍‍‍‍‍ZjQcmQRYFpfptBannerStart
    > This Message Is From an External Sender
    > This message came from outside your organization.
    > ZjQcmQRYFpfptBannerEnd
    > Alex,
    >
    > Metadata will be 4Kib
    >
    > Depending on the filesystem version you will also have subblocks to
    > consider V4 filesystems have 1/32 subblocks, V5 filesystems have 1/
    > 1024 subblocks (assuming metadata and data block size is the same)
    >
    > My first question would be is “ Are you sure that Linux OS is
    > configured the same on all 4 NSD servers?.
    >
    > My second question would be do you know what your average file size
    > is if most of your files are smaller than your filesystem block
    > size, then you are always going to be performing writes using groups
    > of subblocks rather than a full block writes.
    >
    > Regards,
    >
    > Andrew
    >
    > On 24 Feb 2022, at 04:39, Alex Chekholko <a...@calicolabs.com>
    wrote:

    >  Hi, Metadata I/Os will always be smaller than the usual data block
    > size, right? Which version of GPFS? Regards, Alex On Wed, Feb 23,
    > 2022 at 10:26 AM Uwe Falke <uwe.fa...@kit.edu> wrote: Dear all,
    > sorry for asking a question which seems ZjQcmQRYFpfptBannerStart
    > This Message Is From an External Sender
    > This message came from outside your organization.
    > ZjQcmQRYFpfptBannerEnd
    > Hi,
    >
    > Metadata I/Os will always be smaller than the usual data block
    size, right?
    > Which version of GPFS?
    >
    > Regards,
    > Alex
    >
    > On Wed, Feb 23, 2022 at 10:26 AM Uwe Falke <uwe.fa...@kit.edu>
    wrote:
    > Dear all,
    >
    > sorry for asking a question which seems not directly GPFS related:
    >
    > In a setup with 4 NSD servers (old-style, with storage
    controllers in
    > the back end), 12 clients and 10 Seagate storage systems, I do
    see in
    > benchmark tests that  just one of the NSD servers does send
    smaller IO
    > requests to the storage  than the other 3 (that is, both reads and
    > writes are smaller).
    >
    > The NSD servers form 2 pairs, each pair is connected to 5
    seagate boxes
    > ( one server to the controllers A, the other one to controllers
    B of the
    > Seagates, resp.).
    >
    > All 4 NSD servers are set up similarly:
    >
    > kernel: 3.10.0-1160.el7.x86_64 #1 SMP
    >
    > HBA: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx
    >
    > driver : mpt3sas 31.100.01.00
    >
    > max_sectors_kb=8192 (max_hw_sectors_kb=16383 , not 16384, as
    limited by
    > mpt3sas) for all sd devices and all multipath (dm) devices built
    on top.
    >
    > scheduler: deadline
    >
    > multipath (actually we do have 3 paths to each volume, so there
    is some
    > asymmetry, but that should not affect the IOs, shouldn't it?,
    and if it
    > did we would see the same effect in both pairs of NSD servers,
    but we do
    > not).
    >
    > All 4 storage systems are also configured the same way (2 disk
    groups /
    > pools / declustered arrays, one managed by  ctrl A, one by ctrl
    B,  and
    > 8 volumes out of each; makes altogether 2 x 8 x 10 = 160 NSDs).
    >
    >
    > GPFS BS is 8MiB , according to iohistory (mmdiag) we do see clean IO
    > requests of 16384 disk blocks (i.e. 8192kiB) from GPFS.
    >
    > The first question I have - but that is not my main one: I do
    see, both
    > in iostat and on the storage systems, that the default IO
    requests are
    > about 4MiB, not 8MiB as I'd expect from above settings
    (max_sectors_kb
    > is really in terms of kiB, not sectors, cf.
    > https://www.kernel.org/doc/Documentation/block/queue-sysfs.txt).
    >
    > But what puzzles me even more: one of the server compiles IOs even
    > smaller, varying between 3.2MiB and 3.6MiB mostly - both for
    reads and
    > writes ... I just cannot see why.
    >
    > I have to suspect that this will (in writing to the storage) cause
    > incomplete stripe writes on our erasure-coded volumes (8+2p)(as
    long as
    > the controller is not able to re-coalesce the data properly; and it
    > seems it cannot do it completely at least)
    >
    >
    > If someone of you has seen that already and/or knows a potential
    > explanation I'd be glad to learn about.
    >
    >
    > And if some of you wonder: yes, I (was) moved away from IBM and
    am now
    > at KIT.
    >
    > Many thanks in advance
    >
    > Uwe
    >
    >
    > --
    > Karlsruhe Institute of Technology (KIT)
    > Steinbuch Centre for Computing (SCC)
    > Scientific Data Management (SDM)
    >
    > Uwe Falke
    >
    > Hermann-von-Helmholtz-Platz 1, Building 442, Room 187
    > D-76344 Eggenstein-Leopoldshafen
    >
    > Tel: +49 721 608 28024
    > Email: uwe.fa...@kit.edu
    > www.scc.kit.edu
    >
    > Registered office:
    > Kaiserstraße 12, 76131 Karlsruhe, Germany
    >
    > KIT – The Research University in the Helmholtz Association
    >
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    >
    > _______________________________________________
    > gpfsug-discuss mailing list
    > gpfsug-discuss at spectrumscale.org
    > INVALID URI REMOVED
    >
    
u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-
    > siA1ZOg&r=RGTETs2tk0Kz_VOpznDVDkqChhnfLapOTkxLvgmR2-M&m=-
    >
    
FdZvYBvHDPnBTu2FtPkLT09ahlYp2QsMutqNV2jWaY&s=S4C2D3_h4FJLAw0PUYLKhKE242vn_fwn-1_EJmHNpE8&e=

    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Karlsruhe Institute of Technology (KIT)
Steinbuch Centre for Computing (SCC)
Scientific Data Management (SDM)

Uwe Falke

Hermann-von-Helmholtz-Platz 1, Building 442, Room 187
D-76344 Eggenstein-Leopoldshafen

Tel: +49 721 608 28024
Email:uwe.fa...@kit.edu
www.scc.kit.edu

Registered office:
Kaiserstraße 12, 76131 Karlsruhe, Germany

KIT – The Research University in the Helmholtz Association

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] IO sizes

Reply via email to