Aaron, Thanks for jumping onboard. It's nice to see others confirming this. Sometimes I feel alone on this topic.
It's should also be possible to use ZFS with ZVOLs presented as block devices for a backing store for NSDs. I'm not claiming it's stable, nor a good idea, nor performant.. but should be possible. :) There are various reports about it. Might be at least worth looking in to compared to Linux "md raid" if one truly needs an all-software solution that already exists. Something to think about and test over. On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister <aaron.s.knis...@nasa.gov> wrote: > Thanks Zach, I was about to echo similar sentiments and you saved me a ton > of typing :) > > Bob, I know this doesn't help you today since I'm pretty sure its not yet > available, but if one scours the interwebs they can find mention of > something called Mestor. > > There's very very limited information here: > > - https://indico.cern.ch/event/531810/contributions/2306222/at > tachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf > - https://www.yumpu.com/en/document/view/5544551/ibm-system-x- > gpfs-storage-server-stfc (slide 20) > > Sounds like if it were available it would fit this use case very well. > > I also had preliminary success with using sheepdog ( > https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a > similar situation. It's perhaps at a very high conceptually level similar > to Mestor. You erasure code your data across the nodes w/ the SAS disks and > then present those block devices to your NSD servers. I proved it could > work but never tried to to much with it because the requirements changed. > > My money would be on your first option-- creating local RAIDs and then > replicating to give you availability in the event a node goes offline. > > -Aaron > > > On 11/30/16 10:59 PM, Zachary Giles wrote: > >> Just remember that replication protects against data availability, not >> integrity. GPFS still requires the underlying block device to return >> good data. >> >> If you're using it on plain disks (SAS or SSD), and the drive returns >> corrupt data, GPFS won't know any better and just deliver it to the >> client. Further, if you do a partial read followed by a write, both >> replicas could be destroyed. There's also no efficient way to force use >> of a second replica if you realize the first is bad, short of taking the >> first entirely offline. In that case while migrating data, there's no >> good way to prevent read-rewrite of other corrupt data on your drive >> that has the "good copy" while restriping off a faulty drive. >> >> Ideally RAID would have a goal of only returning data that passed the >> RAID algorithm, so shouldn't be corrupt, or made good by recreating from >> parity. However, as we all know RAID controllers are definitely prone to >> failures as well for many reasons, but at least a drive can go bad in >> various ways (bad sectors, slow, just dead, poor SSD cell wear, etc) >> without (hopefully) silent corruption.. >> >> Just something to think about while considering replication .. >> >> >> >> On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke <uwefa...@de.ibm.com >> <mailto:uwefa...@de.ibm.com>> wrote: >> >> I have once set up a small system with just a few SSDs in two NSD >> servers, >> providin a scratch file system in a computing cluster. >> No RAID, two replica. >> works, as long the admins do not do silly things (like rebooting >> servers >> in sequence without checking for disks being up in between). >> Going for RAIDs without GPFS replication protects you against single >> disk >> failures, but you're lost if just one of your NSD servers goes off. >> >> FPO makes sense only sense IMHO if your NSD servers are also >> processing >> the data (and then you need to control that somehow). >> >> Other ideas? what else can you do with GPFS and local disks than >> what you >> considered? I suppose nothing reasonable ... >> >> >> Mit freundlichen Grüßen / Kind regards >> >> >> Dr. Uwe Falke >> >> IT Specialist >> High Performance Computing Services / Integrated Technology Services / >> Data Center Services >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------- >> IBM Deutschland >> Rathausstr. 7 >> 09111 Chemnitz >> Phone: +49 371 6978 2165 <tel:%2B49%20371%206978%202165> >> Mobile: +49 175 575 2877 <tel:%2B49%20175%20575%202877> >> E-Mail: uwefa...@de.ibm.com <mailto:uwefa...@de.ibm.com> >> ------------------------------------------------------------ >> ------------------------------------------------------------ >> ------------------- >> IBM Deutschland Business & Technology Services GmbH / >> Geschäftsführung: >> Frank Hammer, Thorsten Moehring >> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht >> Stuttgart, >> HRB 17122 >> >> >> >> >> From: "Oesterlin, Robert" <robert.oester...@nuance.com >> <mailto:robert.oester...@nuance.com>> >> To: gpfsug main discussion list >> <gpfsug-discuss@spectrumscale.org >> <mailto:gpfsug-discuss@spectrumscale.org>> >> Date: 11/30/2016 03:34 PM >> Subject: [gpfsug-discuss] Strategies - servers with local SAS >> disks >> Sent by: gpfsug-discuss-boun...@spectrumscale.org >> <mailto:gpfsug-discuss-boun...@spectrumscale.org> >> >> >> >> Looking for feedback/strategies in setting up several GPFS servers >> with >> local SAS. They would all be part of the same file system. The >> systems are >> all similar in configuration - 70 4TB drives. >> >> Options I?m considering: >> >> - Create RAID arrays of the disks on each server (worried about the >> RAID >> rebuild time when a drive fails with 4, 6, 8TB drives) >> - No RAID with 2 replicas, single drive per NSD. When a drive fails, >> recreate the NSD ? but then I need to fix up the data replication via >> restripe >> - FPO ? with multiple failure groups - letting the system manage >> replica >> placement and then have GPFS due the restripe on disk failure >> automatically >> >> Comments or other ideas welcome. >> >> Bob Oesterlin >> Sr Principal Storage Engineer, Nuance >> 507-269-0413 <tel:507-269-0413> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss> >> >> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss> >> >> >> >> >> -- >> Zach Giles >> zgi...@gmail.com <mailto:zgi...@gmail.com> >> >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > -- Zach Giles zgi...@gmail.com
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss