Aaron, Thanks for jumping onboard. It's nice to see others confirming this.
Sometimes I feel alone on this topic.

It's should also be possible to use ZFS with ZVOLs presented as block
devices for a backing store for NSDs. I'm not claiming it's stable, nor a
good idea, nor performant.. but should be possible. :) There are various
reports about it. Might be at least worth looking in to compared to Linux
"md raid" if one truly needs an all-software solution that already exists.
Something to think about and test over.

On Wed, Nov 30, 2016 at 11:15 PM, Aaron Knister <aaron.s.knis...@nasa.gov>
wrote:

> Thanks Zach, I was about to echo similar sentiments and you saved me a ton
> of typing :)
>
> Bob, I know this doesn't help you today since I'm pretty sure its not yet
> available, but if one scours the interwebs they can find mention of
> something called Mestor.
>
> There's very very limited information here:
>
> - https://indico.cern.ch/event/531810/contributions/2306222/at
> tachments/1357265/2053960/Spectrum_Scale-HEPIX_V1a.pdf
> - https://www.yumpu.com/en/document/view/5544551/ibm-system-x-
> gpfs-storage-server-stfc (slide 20)
>
> Sounds like if it were available it would fit this use case very well.
>
> I also had preliminary success with using sheepdog (
> https://sheepdog.github.io/sheepdog/) as a backing store for GPFS in a
> similar situation. It's perhaps at a very high conceptually level similar
> to Mestor. You erasure code your data across the nodes w/ the SAS disks and
> then present those block devices to your NSD servers. I proved it could
> work but never tried to to much with it because the requirements changed.
>
> My money would be on your first option-- creating local RAIDs and then
> replicating to give you availability in the event a node goes offline.
>
> -Aaron
>
>
> On 11/30/16 10:59 PM, Zachary Giles wrote:
>
>> Just remember that replication protects against data availability, not
>> integrity. GPFS still requires the underlying block device to return
>> good data.
>>
>> If you're using it on plain disks (SAS or SSD), and the drive returns
>> corrupt data, GPFS won't know any better and just deliver it to the
>> client. Further, if you do a partial read followed by a write, both
>> replicas could be destroyed. There's also no efficient way to force use
>> of a second replica if you realize the first is bad, short of taking the
>> first entirely offline. In that case while migrating data, there's no
>> good way to prevent read-rewrite of other corrupt data on your drive
>> that has the "good copy" while restriping off a faulty drive.
>>
>> Ideally RAID would have a goal of only returning data that passed the
>> RAID algorithm, so shouldn't be corrupt, or made good by recreating from
>> parity. However, as we all know RAID controllers are definitely prone to
>> failures as well for many reasons, but at least a drive can go bad in
>> various ways (bad sectors, slow, just dead, poor SSD cell wear, etc)
>> without (hopefully) silent corruption..
>>
>> Just something to think about while considering replication ..
>>
>>
>>
>> On Wed, Nov 30, 2016 at 11:28 AM, Uwe Falke <uwefa...@de.ibm.com
>> <mailto:uwefa...@de.ibm.com>> wrote:
>>
>>     I have once set up a small system with just a few SSDs in two NSD
>>     servers,
>>     providin a scratch file system in a computing cluster.
>>     No RAID, two replica.
>>     works, as long the admins do not do silly things (like rebooting
>> servers
>>     in sequence without checking for disks being up in between).
>>     Going for RAIDs without GPFS replication protects you against single
>>     disk
>>     failures, but you're lost if just one of your NSD servers goes off.
>>
>>     FPO makes sense only sense IMHO if your NSD servers are also
>> processing
>>     the data (and then you need to control that somehow).
>>
>>     Other ideas? what else can you do with GPFS and local disks than
>>     what you
>>     considered? I suppose nothing reasonable ...
>>
>>
>>     Mit freundlichen Grüßen / Kind regards
>>
>>
>>     Dr. Uwe Falke
>>
>>     IT Specialist
>>     High Performance Computing Services / Integrated Technology Services /
>>     Data Center Services
>>     ------------------------------------------------------------
>> ------------------------------------------------------------
>> -------------------
>>     IBM Deutschland
>>     Rathausstr. 7
>>     09111 Chemnitz
>>     Phone: +49 371 6978 2165 <tel:%2B49%20371%206978%202165>
>>     Mobile: +49 175 575 2877 <tel:%2B49%20175%20575%202877>
>>     E-Mail: uwefa...@de.ibm.com <mailto:uwefa...@de.ibm.com>
>>     ------------------------------------------------------------
>> ------------------------------------------------------------
>> -------------------
>>     IBM Deutschland Business & Technology Services GmbH /
>> Geschäftsführung:
>>     Frank Hammer, Thorsten Moehring
>>     Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht
>>     Stuttgart,
>>     HRB 17122
>>
>>
>>
>>
>>     From:   "Oesterlin, Robert" <robert.oester...@nuance.com
>>     <mailto:robert.oester...@nuance.com>>
>>     To:     gpfsug main discussion list
>>     <gpfsug-discuss@spectrumscale.org
>>     <mailto:gpfsug-discuss@spectrumscale.org>>
>>     Date:   11/30/2016 03:34 PM
>>     Subject:        [gpfsug-discuss] Strategies - servers with local SAS
>>     disks
>>     Sent by:        gpfsug-discuss-boun...@spectrumscale.org
>>     <mailto:gpfsug-discuss-boun...@spectrumscale.org>
>>
>>
>>
>>     Looking for feedback/strategies in setting up several GPFS servers
>> with
>>     local SAS. They would all be part of the same file system. The
>>     systems are
>>     all similar in configuration - 70 4TB drives.
>>
>>     Options I?m considering:
>>
>>     - Create RAID arrays of the disks on each server (worried about the
>> RAID
>>     rebuild time when a drive fails with 4, 6, 8TB drives)
>>     - No RAID with 2 replicas, single drive per NSD. When a drive fails,
>>     recreate the NSD ? but then I need to fix up the data replication via
>>     restripe
>>     - FPO ? with multiple failure groups -  letting the system manage
>>     replica
>>     placement and then have GPFS due the restripe on disk failure
>>     automatically
>>
>>     Comments or other ideas welcome.
>>
>>     Bob Oesterlin
>>     Sr Principal Storage Engineer, Nuance
>>     507-269-0413 <tel:507-269-0413>
>>
>>      _______________________________________________
>>     gpfsug-discuss mailing list
>>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>     <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>>
>>
>>
>>
>>     _______________________________________________
>>     gpfsug-discuss mailing list
>>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>     <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>>
>>
>>
>>
>> --
>> Zach Giles
>> zgi...@gmail.com <mailto:zgi...@gmail.com>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>



-- 
Zach Giles
zgi...@gmail.com
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to