Re: [gpfsug-discuss] Write performances and filesystem size

Ivano Talamo Wed, 22 Nov 2017 00:24:07 -0800

Hello Olaf,

thank you for your reply and for confirming that this is not expected,as we also thought. We did repeat the test with 2 vdisks only withoutdedicated ones for metadata but the result did not change.


We now opened a PMR.

Thanks,
Ivano

Il 16/11/17 17:08, Olaf Weiser ha scritto:

Hi Ivano,
so from this output, the performance degradation is not explainable ..
in my current environments.. , having multiple file systems (so vdisks
on one BB) .. and it works fine ..

 as said .. just open a PMR.. I would'nt consider this as the "expected
behavior"
the only thing is.. the MD disks are a bit small.. so maybe redo your
tests and for a simple compare between 1/2 1/1 or 1/4 capacity  test
with 2 vdisks only and /dataAndMetadata/
cheers





From:        Ivano Talamo <[email protected]>
To:        gpfsug main discussion list <[email protected]>
Date:        11/16/2017 08:52 AM
Subject:        Re: [gpfsug-discuss] Write performances and filesystem size
Sent by:        [email protected]
------------------------------------------------------------------------



Hi,

as additional information I past the recovery group information in the
full and half size cases.
In both cases:
- data is on sf_g_01_vdisk01
- metadata on sf_g_01_vdisk02
- sf_g_01_vdisk07 is not used in the filesystem.

This is with the full-space filesystem:

                    declustered                     current       allowable
 recovery group       arrays     vdisks  pdisks  format version format
version
 -----------------  -----------  ------  ------  --------------
--------------
 sf-g-01                      3       6      86  4.2.2.0        4.2.2.0


 declustered   needs                            replace
scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space
duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------
--------  -------------------------
 NVR          no            1       2     0,0          1    3632 MiB
14 days  scrub       95%  low
 DA1          no            4      83    2,44          1      57 TiB
14 days  scrub        0%  low
 SSD          no            1       1     0,0          1     372 GiB
14 days  scrub       79%  low

                                         declustered
       checksum
 vdisk               RAID code              array     vdisk size  block
size  granularity  state remarks
 ------------------  ------------------  -----------  ----------
----------  -----------  ----- -------
 sf_g_01_logTip      2WayReplication     NVR              48 MiB      2
MiB      4096      ok    logTip
 sf_g_01_logTipBackup  Unreplicated        SSD              48 MiB
2 MiB      4096      ok    logTipBackup
 sf_g_01_logHome     4WayReplication     DA1             144 GiB      2
MiB      4096      ok    log
 sf_g_01_vdisk02     3WayReplication     DA1             103 GiB      1
MiB     32 KiB     ok
 sf_g_01_vdisk07     3WayReplication     DA1             103 GiB      1
MiB     32 KiB     ok
 sf_g_01_vdisk01     8+2p                DA1             540 TiB     16
MiB     32 KiB     ok

 config data         declustered array   spare space    remarks
 ------------------  ------------------  -------------  -------
 rebuild space       DA1                 53 pdisk
    increasing VCD spares is suggested

 config data         disk group fault tolerance         remarks
 ------------------  ---------------------------------  -------
 rg descriptor       1 enclosure + 1 drawer + 2 pdisk   limited by
rebuild space
 system index        1 enclosure + 1 drawer + 2 pdisk   limited by
rebuild space

 vdisk               disk group fault tolerance         remarks
 ------------------  ---------------------------------  -------
 sf_g_01_logTip      1 pdisk
 sf_g_01_logTipBackup  0 pdisk
 sf_g_01_logHome     1 enclosure + 1 drawer + 1 pdisk   limited by
rebuild space
 sf_g_01_vdisk02     1 enclosure + 1 drawer             limited by
rebuild space
 sf_g_01_vdisk07     1 enclosure + 1 drawer             limited by
rebuild space
 sf_g_01_vdisk01     2 pdisk


This is with the half-space filesystem:

                    declustered                     current       allowable
 recovery group       arrays     vdisks  pdisks  format version format
version
 -----------------  -----------  ------  ------  --------------
--------------
 sf-g-01                      3       6      86  4.2.2.0        4.2.2.0


 declustered   needs                            replace
scrub       background activity
    array     service  vdisks  pdisks  spares  threshold  free space
duration  task   progress  priority
 -----------  -------  ------  ------  ------  ---------  ----------
--------  -------------------------
 NVR          no            1       2     0,0          1    3632 MiB
14 days  scrub        4%  low
 DA1          no            4      83    2,44          1     395 TiB
14 days  scrub        0%  low
 SSD          no            1       1     0,0          1     372 GiB
14 days  scrub       79%  low

                                         declustered
       checksum
 vdisk               RAID code              array     vdisk size  block
size  granularity  state remarks
 ------------------  ------------------  -----------  ----------
----------  -----------  ----- -------
 sf_g_01_logTip      2WayReplication     NVR              48 MiB      2
MiB      4096      ok    logTip
 sf_g_01_logTipBackup  Unreplicated        SSD              48 MiB
2 MiB      4096      ok    logTipBackup
 sf_g_01_logHome     4WayReplication     DA1             144 GiB      2
MiB      4096      ok    log
 sf_g_01_vdisk02     3WayReplication     DA1             103 GiB      1
MiB     32 KiB     ok
 sf_g_01_vdisk07     3WayReplication     DA1             103 GiB      1
MiB     32 KiB     ok
 sf_g_01_vdisk01     8+2p                DA1             270 TiB     16
MiB     32 KiB     ok

 config data         declustered array   spare space    remarks
 ------------------  ------------------  -------------  -------
 rebuild space       DA1                 68 pdisk
    increasing VCD spares is suggested

 config data         disk group fault tolerance         remarks
 ------------------  ---------------------------------  -------
 rg descriptor       1 node + 3 pdisk                   limited by
rebuild space
 system index        1 node + 3 pdisk                   limited by
rebuild space

 vdisk               disk group fault tolerance         remarks
 ------------------  ---------------------------------  -------
 sf_g_01_logTip      1 pdisk
 sf_g_01_logTipBackup  0 pdisk
 sf_g_01_logHome     1 node + 2 pdisk                   limited by
rebuild space
 sf_g_01_vdisk02     1 node + 1 pdisk                   limited by
rebuild space
 sf_g_01_vdisk07     1 node + 1 pdisk                   limited by
rebuild space
 sf_g_01_vdisk01     2 pdisk


Thanks,
Ivano




Il 16/11/17 13:03, Olaf Weiser ha scritto:

Rjx, that makes it a bit clearer.. as  your vdisk  is big enough to span
over all pdisks  in each of your test 1/1 or 1/2 or 1/4  of capacity...
should bring the same performance. ..

You mean something about vdisk Layout. ..
So in your test,  for the full capacity test, you use just one vdisk per
RG - so 2 in total for 'data' - right?

What about Md .. did you create separate vdisk for MD  / what size then
?

Gesendet von IBM Verse

Ivano Talamo --- Re: [gpfsug-discuss] Write performances and filesystem
size ---

Von:                 "Ivano Talamo" <[email protected]>
An:                 "gpfsug main discussion list"

<[email protected]>

Datum:                 Do. 16.11.2017 03:49
Betreff:                 Re: [gpfsug-discuss] Write performances and

filesystem size


------------------------------------------------------------------------

Hello Olaf,

yes, I confirm that is the Lenovo version of the ESS GL2, so 2
enclosures/4 drawers/166 disks in total.

Each recovery group has one declustered array with all disks inside, so
vdisks use all the physical ones, even in the case of a vdisk that is
1/4 of the total size.

Regarding the layout allocation we used scatter.

The tests were done on the just created filesystem, so no close-to-full
effect. And we run gpfsperf write seq.

Thanks,
Ivano


Il 16/11/17 04:42, Olaf Weiser ha scritto:

Sure... as long we assume that really all physical disk are used .. the
fact that  was told 1/2  or 1/4  might turn out that one / two complet
enclosures 're eliminated ... ?  ..that s why I was asking for  more
details ..

I dont see this degration in my environments. . as long the vdisks are
big enough to span over all pdisks ( which should be the case for
capacity in a range of TB ) ... the performance stays the same

Gesendet von IBM Verse

Jan-Frode Myklebust --- Re: [gpfsug-discuss] Write performances and
filesystem size ---

Von:    "Jan-Frode Myklebust" <[email protected]>
An:    "gpfsug main discussion list" <[email protected]>
Datum:    Mi. 15.11.2017 21:35
Betreff:    Re: [gpfsug-discuss] Write performances and filesystem size

------------------------------------------------------------------------

Olaf, this looks like a Lenovo «ESS GLxS» version. Should be using same
number of spindles for any size filesystem, so I would also expect them
to perform the same.



-jf


ons. 15. nov. 2017 kl. 11:26 skrev Olaf Weiser <[email protected]
<mailto:[email protected]>>:

     to add a comment ...  .. very simply... depending on how you
    allocate the physical block storage .... if you - simply - using
    less physical resources when reducing the capacity (in the same
    ratio) .. you get , what you see....

    so you need to tell us, how you allocate your block-storage .. (Do
    you using RAID controllers , where are your LUNs coming from, are
    then less RAID groups involved, when reducing the capacity ?...)

    GPFS can be configured to give you pretty as much as what the
    hardware can deliver.. if you reduce resource.. ... you'll get less
    , if you enhance your hardware .. you get more... almost regardless
    of the total capacity in #blocks ..






    From:        "Kumaran Rajaram" <[email protected]
    <mailto:[email protected]>>
    To:        gpfsug main discussion list
    <[email protected]
    <mailto:[email protected]>>
    Date:        11/15/2017 11:56 AM
    Subject:        Re: [gpfsug-discuss] Write performances and
    filesystem size
    Sent by:        [email protected]
    <mailto:[email protected]>

------------------------------------------------------------------------




    Hi,

    >>Am I missing something? Is this an expected behaviour and someone
    has an explanation for this?

    Based on your scenario, write degradation as the file-system is
    populated is possible if you had formatted the file-system with "-j
    cluster".

    For consistent file-system performance, we recommend *mmcrfs "-j
    scatter" layoutMap.*   Also, we need to ensure the mmcrfs "-n"  is
    set properly.

    [snip from mmcrfs]/
    # mmlsfs <fs> | egrep 'Block allocation| Estimated number'
    -j                 scatter                  Block allocation type
    -n                 128                       Estimated number of
    nodes that will mount file system/
    [/snip]


    [snip from man mmcrfs]/
    *layoutMap={scatter|*//*cluster}*//
                     Specifies the block allocation map type. When
                     allocating blocks for a given file, GPFS first
                     uses a round‐robin algorithm to spread the data
                     across all disks in the storage pool. After a
                     disk is selected, the location of the data
                     block on the disk is determined by the block
                     allocation map type*. If cluster is
                     specified, GPFS attempts to allocate blocks in
                     clusters. Blocks that belong to a particular
                     file are kept adjacent to each other within
                     each cluster. If scatter is specified,
                     the location of the block is chosen randomly.*/
    /
                 *  The cluster allocation method may provide
                     better disk performance for some disk
                     subsystems in relatively small installations.
                     The benefits of clustered block allocation
                     diminish when the number of nodes in the
                     cluster or the number of disks in a file system
                     increases, or when the file system’s free space
                     becomes fragmented. *//The *cluster*//
                     allocation method is the default for GPFS
                     clusters with eight or fewer nodes and for file
                     systems with eight or fewer disks./
    /
                    *The scatter allocation method provides
                     more consistent file system performance by
                     averaging out performance variations due to
                     block location (for many disk subsystems, the
                     location of the data relative to the disk edge
                     has a substantial effect on performance).*//This
                     allocation method is appropriate in most cases
                     and is the default for GPFS clusters with more
                     than eight nodes or file systems with more than
                     eight disks./
    /
                     The block allocation map type cannot be changed
                     after the storage pool has been created./

    */
    -n/*/*NumNodes*//
            The estimated number of nodes that will mount the file
            system in the local cluster and all remote clusters.
            This is used as a best guess for the initial size of
            some file system data structures. The default is 32.
            This value can be changed after the file system has been
            created but it does not change the existing data
            structures. Only the newly created data structure is
            affected by the new value. For example, new storage
            pool./
    /
            When you create a GPFS file system, you might want to
            overestimate the number of nodes that will mount the
            file system. GPFS uses this information for creating
            data structures that are essential for achieving maximum
            parallelism in file system operations (For more
            information, see GPFS architecture in IBM Spectrum
            Scale: Concepts, Planning, and Installation Guide ). If
            you are sure there will never be more than 64 nodes,
            allow the default value to be applied. If you are
            planning to add nodes to your system, you should specify
            a number larger than the default./

    [/snip from man mmcrfs]

    Regards,
    -Kums





    From:        Ivano Talamo <[email protected]
    <mailto:[email protected]>>
    To:        <[email protected]
    <mailto:[email protected]>>
    Date:        11/15/2017 11:25 AM
    Subject:        [gpfsug-discuss] Write performances and filesystem

size

    Sent by:        [email protected]
    <mailto:[email protected]>

------------------------------------------------------------------------




    Hello everybody,

    together with my colleagues we are actually running some tests on

a new

    DSS G220 system and we see some unexpected behaviour.

    What we actually see is that write performances (we did not test read
    yet) decreases with the decrease of filesystem size.

    I will not go into the details of the tests, but here are some

numbers:


    - with a filesystem using the full 1.2 PB space we get 14 GB/s as the
    sum of the disk activity on the two IO servers;
    - with a filesystem using half of the space we get 10 GB/s;
    - with a filesystem using 1/4 of the space we get 5 GB/s.

    We also saw that performances are not affected by the vdisks layout,
    ie.
    taking the full space with one big vdisk or 2 half-size vdisks per RG
    gives the same performances.

    To our understanding the IO should be spread evenly across all the
    pdisks in the declustered array, and looking at iostat all disks
    seem to
    be accessed. But so there must be some other element that affects
    performances.

    Am I missing something? Is this an expected behaviour and someone
    has an
    explanation for this?

    Thank you,
    Ivano
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org <http://spectrumscale.org

<http://spectrumscale.org/>>_

__https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=McIf98wfiVqHU8ZygezLrQ&m=py_FGl3hi9yQsby94NZdpBFPwcUU0FREyMSSvuK_10U&s=Bq1J9eIXxadn5yrjXPHmKEht0CDBwfKJNH72p--T-6s&e=_



    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org <http://spectrumscale.org

<http://spectrumscale.org/>>

    http://gpfsug.org/mailman/listinfo/gpfsug-discuss



    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org <http://spectrumscale.org

<http://spectrumscale.org/>>

    http://gpfsug.org/mailman/listinfo/gpfsug-discuss




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss





_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Write performances and filesystem size

Reply via email to