So as a basis for our archive solution, we're using a GPFS cluster in a stretch configuration, with 2 sites separated by about 20ms worth of 10G link. Each end has 2 protocol servers doing NFS and 3 NSD servers. Identical disk arrays and LTFS/EE at both ends, and all metadata and userdata are replicated to both sites.
We had a fiber issue for about 8 hours yesterday, and as expected (since there are only 5 quorum nodes, 3 local and 2 at the far end) the far end fell off the cluster and down'ed all the NSDs on the remote arrays. There's about 123T of data at each end, 6 million files in there so far. So after the fiber came back up after a several-hour downtime, I did the 'mmchdisk archive start -a'. That was at 17:45 yesterday. I'm now 20 hours in, at: 62.15 % complete on Fri Nov 18 13:52:59 2016 ( 4768429 inodes with total 173675926 MB data processed) 62.17 % complete on Fri Nov 18 13:53:20 2016 ( 4769416 inodes with total 173710731 MB data processed) 62.18 % complete on Fri Nov 18 13:53:40 2016 ( 4772481 inodes with total 173762456 MB data processed) network statistics indicate that the 3 local NSDs are all tossing out packets at about 400Mbytes/second, which means the 10G pipe is pretty damned close to totally packed full, and the 3 remotes are sending back ACKs of all the data. Rough back-of-envelop calculations indicate that (a) if I'm at 62% after 20 hours, it will take 30 hours to finish and (b) a 10G link takes about 29 hours at full blast to move 123T of data. So it certainly *looks* like it's resending everything. And that's even though at least 100T of that 123T is test data that was written by one of our users back on Nov 12/13, and thus theoretically *should* already have been at the remote site. Any ideas what's going on here? _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
