Something I haven't heard in this discussion, it that of licensing of GPFS.
I believe that once you export disks from a node it then becomes a server node and the license may need to be changed, from client to server. There goes the budget. -----Original Message----- From: gpfsug-discuss-boun...@spectrumscale.org <gpfsug-discuss-boun...@spectrumscale.org> On Behalf Of Lukas Hejtmanek Sent: Wednesday, March 14, 2018 4:28 AM To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Subject: Re: [gpfsug-discuss] Preferred NSD Hello, thank you for insight. Well, the point is, that I will get ~60 with 120 NVMe disks in it, each about 2TB size. It means that I will have 240TB in NVMe SSD that could build nice shared scratch. Moreover, I have no different HW or place to put these SSDs into. They have to be in the compute nodes. On Tue, Mar 13, 2018 at 10:48:21AM -0700, Alex Chekholko wrote: > I would like to discourage you from building a large distributed > clustered filesystem made of many unreliable components. You will > need to overprovision your interconnect and will also spend a lot of > time in "healing" or "degraded" state. > > It is typically cheaper to centralize the storage into a subset of > nodes and configure those to be more highly available. E.g. of your > 60 nodes, take 8 and put all the storage into those and make that a > dedicated GPFS cluster with no compute jobs on those nodes. Again, > you'll still need really beefy and reliable interconnect to make this work. > > Stepping back; what is the actual problem you're trying to solve? I > have certainly been in that situation before, where the problem is > more like: "I have a fixed hardware configuration that I can't change, > and I want to try to shoehorn a parallel filesystem onto that." > > I would recommend looking closer at your actual workloads. If this is > a "scratch" filesystem and file access is mostly from one node at a > time, it's not very useful to make two additional copies of that data > on other nodes, and it will only slow you down. > > Regards, > Alex > > On Tue, Mar 13, 2018 at 7:16 AM, Lukas Hejtmanek > <xhejt...@ics.muni.cz> > wrote: > > > On Tue, Mar 13, 2018 at 10:37:43AM +0000, John Hearns wrote: > > > Lukas, > > > It looks like you are proposing a setup which uses your compute > > > servers > > as storage servers also? > > > > yes, exactly. I would like to utilise NVMe SSDs that are in every > > compute servers.. Using them as a shared scratch area with GPFS is > > one of the options. > > > > > > > > * I'm thinking about the following setup: > > > ~ 60 nodes, each with two enterprise NVMe SSDs, FDR IB > > > interconnected > > > > > > There is nothing wrong with this concept, for instance see > > > https://www.beegfs.io/wiki/BeeOND > > > > > > I have an NVMe filesystem which uses 60 drives, but there are 10 servers. > > > You should look at "failure zones" also. > > > > you still need the storage servers and local SSDs to use only for > > caching, do I understand correctly? > > > > > > > > From: gpfsug-discuss-boun...@spectrumscale.org > > > [mailto:gpfsug-discuss- > > boun...@spectrumscale.org] On Behalf Of Knister, Aaron S. > > (GSFC-606.2)[COMPUTER SCIENCE CORP] > > > Sent: Monday, March 12, 2018 4:14 PM > > > To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> > > > Subject: Re: [gpfsug-discuss] Preferred NSD > > > > > > Hi Lukas, > > > > > > Check out FPO mode. That mimics Hadoop's data placement features. > > > You > > can have up to 3 replicas both data and metadata but still the > > downside, though, as you say is the wrong node failures will take your > > cluster down. > > > > > > You might want to check out something like Excelero's NVMesh > > > (note: not > > an endorsement since I can't give such things) which can create > > logical volumes across all your NVMe drives. The product has erasure > > coding on their roadmap. I'm not sure if they've released that > > feature yet but in theory it will give better fault tolerance *and* > > you'll get more efficient usage of your SSDs. > > > > > > I'm sure there are other ways to skin this cat too. > > > > > > -Aaron > > > > > > > > > > > > On March 12, 2018 at 10:59:35 EDT, Lukas Hejtmanek > > > <xhejt...@ics.muni.cz > > <mailto:xhejt...@ics.muni.cz>> wrote: > > > Hello, > > > > > > I'm thinking about the following setup: > > > ~ 60 nodes, each with two enterprise NVMe SSDs, FDR IB > > > interconnected > > > > > > I would like to setup shared scratch area using GPFS and those > > > NVMe > > SSDs. Each > > > SSDs as on NSD. > > > > > > I don't think like 5 or more data/metadata replicas are practical here. > > On the > > > other hand, multiple node failures is something really expected. > > > > > > Is there a way to instrument that local NSD is strongly preferred > > > to > > store > > > data? I.e. node failure most probably does not result in > > > unavailable > > data for > > > the other nodes? > > > > > > Or is there any other recommendation/solution to build shared > > > scratch > > with > > > GPFS in such setup? (Do not do it including.) > > > > > > -- > > > Lukáš Hejtmánek > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > -- The information contained in this communication and any > > > attachments > > is confidential and may be privileged, and is for the sole use of > > the intended recipient(s). Any unauthorized review, use, disclosure > > or distribution is prohibited. Unless explicitly stated otherwise in > > the body of this communication or the attachment thereto (if any), > > the information is provided on an AS-IS basis without any express or > > implied warranties or liabilities. To the extent you are relying on > > this information, you are doing so at your own risk. If you are not > > the intended recipient, please notify the sender immediately by > > replying to this message and destroy all copies of this message and > > any attachments. Neither the sender nor the company/group of > > companies he or she represents shall be liable for the proper and > > complete transmission of the information contained in this communication, > > or for any delay in its receipt. > > > > > _______________________________________________ > > > gpfsug-discuss mailing list > > > gpfsug-discuss at spectrumscale.org > > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > -- > > Lukáš Hejtmánek > > > > Linux Administrator only because > > Full Time Multitasking Ninja > > is not an official job title > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss -- Lukáš Hejtmánek Linux Administrator only because Full Time Multitasking Ninja is not an official job title _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss