Hi Olaf, yes we have separate vdisks for MD: 2 vdisks, each is 100GBytes large, 1MBytes blocksize, 3WayReplication.
A ________________________________ From: [email protected] [[email protected]] on behalf of Olaf Weiser [[email protected]] Sent: Thursday, November 16, 2017 1:03 PM To: gpfsug main discussion list Subject: Re: [gpfsug-discuss] Write performances and filesystem size Rjx, that makes it a bit clearer.. as your vdisk is big enough to span over all pdisks in each of your test 1/1 or 1/2 or 1/4 of capacity... should bring the same performance. .. You mean something about vdisk Layout. .. So in your test, for the full capacity test, you use just one vdisk per RG - so 2 in total for 'data' - right? What about Md .. did you create separate vdisk for MD / what size then ? Gesendet von IBM Verse Ivano Talamo --- Re: [gpfsug-discuss] Write performances and filesystem size --- Von: "Ivano Talamo" <[email protected]> An: "gpfsug main discussion list" <[email protected]> Datum: Do. 16.11.2017 03:49 Betreff: Re: [gpfsug-discuss] Write performances and filesystem size ________________________________ Hello Olaf, yes, I confirm that is the Lenovo version of the ESS GL2, so 2 enclosures/4 drawers/166 disks in total. Each recovery group has one declustered array with all disks inside, so vdisks use all the physical ones, even in the case of a vdisk that is 1/4 of the total size. Regarding the layout allocation we used scatter. The tests were done on the just created filesystem, so no close-to-full effect. And we run gpfsperf write seq. Thanks, Ivano Il 16/11/17 04:42, Olaf Weiser ha scritto: > Sure... as long we assume that really all physical disk are used .. the > fact that was told 1/2 or 1/4 might turn out that one / two complet > enclosures 're eliminated ... ? ..that s why I was asking for more > details .. > > I dont see this degration in my environments. . as long the vdisks are > big enough to span over all pdisks ( which should be the case for > capacity in a range of TB ) ... the performance stays the same > > Gesendet von IBM Verse > > Jan-Frode Myklebust --- Re: [gpfsug-discuss] Write performances and > filesystem size --- > > Von: "Jan-Frode Myklebust" <[email protected]> > An: "gpfsug main discussion list" <[email protected]> > Datum: Mi. 15.11.2017 21:35 > Betreff: Re: [gpfsug-discuss] Write performances and filesystem size > > ------------------------------------------------------------------------ > > Olaf, this looks like a Lenovo «ESS GLxS» version. Should be using same > number of spindles for any size filesystem, so I would also expect them > to perform the same. > > > > -jf > > > ons. 15. nov. 2017 kl. 11:26 skrev Olaf Weiser <[email protected] > <mailto:[email protected]>>: > > to add a comment ... .. very simply... depending on how you > allocate the physical block storage .... if you - simply - using > less physical resources when reducing the capacity (in the same > ratio) .. you get , what you see.... > > so you need to tell us, how you allocate your block-storage .. (Do > you using RAID controllers , where are your LUNs coming from, are > then less RAID groups involved, when reducing the capacity ?...) > > GPFS can be configured to give you pretty as much as what the > hardware can deliver.. if you reduce resource.. ... you'll get less > , if you enhance your hardware .. you get more... almost regardless > of the total capacity in #blocks .. > > > > > > > From: "Kumaran Rajaram" <[email protected] > <mailto:[email protected]>> > To: gpfsug main discussion list > <[email protected] > <mailto:[email protected]>> > Date: 11/15/2017 11:56 AM > Subject: Re: [gpfsug-discuss] Write performances and > filesystem size > Sent by: [email protected] > <mailto:[email protected]> > ------------------------------------------------------------------------ > > > > Hi, > > >>Am I missing something? Is this an expected behaviour and someone > has an explanation for this? > > Based on your scenario, write degradation as the file-system is > populated is possible if you had formatted the file-system with "-j > cluster". > > For consistent file-system performance, we recommend *mmcrfs "-j > scatter" layoutMap.* Also, we need to ensure the mmcrfs "-n" is > set properly. > > [snip from mmcrfs]/ > # mmlsfs <fs> | egrep 'Block allocation| Estimated number' > -j scatter Block allocation type > -n 128 Estimated number of > nodes that will mount file system/ > [/snip] > > > [snip from man mmcrfs]/ > *layoutMap={scatter|*//*cluster}*// > Specifies the block allocation map type. When > allocating blocks for a given file, GPFS first > uses a round‐robin algorithm to spread the data > across all disks in the storage pool. After a > disk is selected, the location of the data > block on the disk is determined by the block > allocation map type*. If cluster is > specified, GPFS attempts to allocate blocks in > clusters. Blocks that belong to a particular > file are kept adjacent to each other within > each cluster. If scatter is specified, > the location of the block is chosen randomly.*/ > / > * The cluster allocation method may provide > better disk performance for some disk > subsystems in relatively small installations. > The benefits of clustered block allocation > diminish when the number of nodes in the > cluster or the number of disks in a file system > increases, or when the file system’s free space > becomes fragmented. *//The *cluster*// > allocation method is the default for GPFS > clusters with eight or fewer nodes and for file > systems with eight or fewer disks./ > / > *The scatter allocation method provides > more consistent file system performance by > averaging out performance variations due to > block location (for many disk subsystems, the > location of the data relative to the disk edge > has a substantial effect on performance).*//This > allocation method is appropriate in most cases > and is the default for GPFS clusters with more > than eight nodes or file systems with more than > eight disks./ > / > The block allocation map type cannot be changed > after the storage pool has been created./ > > */ > -n/*/*NumNodes*// > The estimated number of nodes that will mount the file > system in the local cluster and all remote clusters. > This is used as a best guess for the initial size of > some file system data structures. The default is 32. > This value can be changed after the file system has been > created but it does not change the existing data > structures. Only the newly created data structure is > affected by the new value. For example, new storage > pool./ > / > When you create a GPFS file system, you might want to > overestimate the number of nodes that will mount the > file system. GPFS uses this information for creating > data structures that are essential for achieving maximum > parallelism in file system operations (For more > information, see GPFS architecture in IBM Spectrum > Scale: Concepts, Planning, and Installation Guide ). If > you are sure there will never be more than 64 nodes, > allow the default value to be applied. If you are > planning to add nodes to your system, you should specify > a number larger than the default./ > > [/snip from man mmcrfs] > > Regards, > -Kums > > > > > > From: Ivano Talamo <[email protected] > <mailto:[email protected]>> > To: <[email protected] > <mailto:[email protected]>> > Date: 11/15/2017 11:25 AM > Subject: [gpfsug-discuss] Write performances and filesystem size > Sent by: [email protected] > <mailto:[email protected]> > ------------------------------------------------------------------------ > > > > Hello everybody, > > together with my colleagues we are actually running some tests on a new > DSS G220 system and we see some unexpected behaviour. > > What we actually see is that write performances (we did not test read > yet) decreases with the decrease of filesystem size. > > I will not go into the details of the tests, but here are some numbers: > > - with a filesystem using the full 1.2 PB space we get 14 GB/s as the > sum of the disk activity on the two IO servers; > - with a filesystem using half of the space we get 10 GB/s; > - with a filesystem using 1/4 of the space we get 5 GB/s. > > We also saw that performances are not affected by the vdisks layout, > ie. > taking the full space with one big vdisk or 2 half-size vdisks per RG > gives the same performances. > > To our understanding the IO should be spread evenly across all the > pdisks in the declustered array, and looking at iostat all disks > seem to > be accessed. But so there must be some other element that affects > performances. > > Am I missing something? Is this an expected behaviour and someone > has an > explanation for this? > > Thank you, > Ivano > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>_ > > __https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=McIf98wfiVqHU8ZygezLrQ&m=py_FGl3hi9yQsby94NZdpBFPwcUU0FREyMSSvuK_10U&s=Bq1J9eIXxadn5yrjXPHmKEht0CDBwfKJNH72p--T-6s&e=_ > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org <http://spectrumscale.org> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
