Zach, GPFS replication does not include automatically a comparison of the replica copies. It protects against one part (i.e. one FG, or two with 3-fold replication) of the storage being down. How should GPFS know what version is the good one if both replica copies are readable?
There are tools in 4.x to compare the replicas, but do use them only from 4.2 onward (problems with prior versions). Still then you need to decide what is the "good" copy (there is a consistency check on MD replicas though, but correct/incorrect data blocks cannot be auto-detected for obvious reasons). E2E Check-summing (as in GNR) would of course help here. Mit freundlichen Grüßen / Kind regards Dr. Uwe Falke IT Specialist High Performance Computing Services / Integrated Technology Services / Data Center Services ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Rathausstr. 7 09111 Chemnitz Phone: +49 371 6978 2165 Mobile: +49 175 575 2877 E-Mail: [email protected] ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Business & Technology Services GmbH / Geschäftsführung: Frank Hammer, Thorsten Moehring Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart, HRB 17122 From: Zachary Giles <[email protected]> To: gpfsug main discussion list <[email protected]> Date: 04/29/2016 06:22 AM Subject: [gpfsug-discuss] GPFS and replication.. not a mirror? Sent by: [email protected] Fellow GPFS Users, I have a silly question about file replicas... I've been playing around with copies=2 (or 3) and hoping that this would protect against data corruption on poor-quality RAID controllers.. but it seems that if I purposefully corrupt blocks on a LUN used by GPFS, the "replica" doesn't take over, rather GPFS just returns corrupt data. This includes if just "dd" into the disk, or if I break the RAID controller somehow by yanking whole chassis and the controller responds poorly for a few seconds. Originally my thinking was that replicas were for mirroring and GPFS would somehow return whichever is the "good" copy of your data, but now I'm thinking it's just intended for better file placement.. such as having a near replica and a far replica so you dont have to cross buildings for access, etc. That, and / or, disk outages where the outage is not corruption, just simply outage either by failure or for disk-moves, SAN rewiring, etc. In those cases you wouldn't have to "move" all the data since you already have a second copy. I can see how that would makes sense.. Somehow I guess I always knew this.. but it seems many people say they will just turn on copies=2 and be "safe".. but it's not the case.. Which way is the intended? Has anyone else had experience with this realization? Thanks, -Zach -- Zach Giles [email protected]_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
