Hi Valdis, I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think this is a data integrity issue, thankfully:
$ ./pipetestls.sh 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16 /srv/gsfs0/projects/pipetest.tmp.txt 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt $ ./pipetestmd5.sh 15cb81a85c9e450bdac8230309453a0a /srv/gsfs0/projects/pipetest.tmp.txt 15cb81a85c9e450bdac8230309453a0a /home/griznog/pipetest.tmp.txt And replacing grep with 'file' even properly sees the files as ASCII: $ ./pipetestfile.sh /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines I'll poke a little harder at grep next and see what the difference in strace of each reveals. Thanks, jbh On Wed, Feb 14, 2018 at 7:08 AM, <valdis.kletni...@vt.edu> wrote: > On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said: > > > # ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 > /home/griznog/pipetest.tmp.txt > > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10 > > /srv/gsfs0/projects/pipetest.tmp.txt > > > > We can "fix" the user case that exposed this by not using a temp file or > > inserting a sleep, but I'd still like to know why GPFS is behaving this > way > > and make it stop. > > May be related to replication, or other behind-the-scenes behavior. > > Consider this example - 4.2.3.6, data and metadata replication both > set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with a full > fiberchannel mesh to 3 Dell MD34something arrays. > > % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test; sleep > 5; ls -ls sync.test; sleep 5; ls -ls sync.test > 4096+0 records in > 4096+0 records out > 4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s > 2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > 8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test > > Notice that the first /bin/ls shouldn't be starting until after the dd has > completed - at which point it's only allocated half the blocks needed to > hold > the 4M of data at one site. 5 seconds later, it's allocated the blocks at > both > sites and thus shows the full 8M needed for 2 copies. > > I've also seen (but haven't replicated it as I write this) a small file > (4-8K > or so) showing first one full-sized block, then a second full-sized block, > and > then dropping back to what's needed for 2 1/32nd fragments. That had me > scratching my head > > Having said that, that's all metadata fun and games, while your case > appears to have some problems with data integrity (which is a whole lot > scarier). It would be *really* nice if we understood the problem here. > > The scariest part is: > > > The first grep | wc -l returns 1, because grep outputs "Binary file > /path/to/ > > gpfs/mount/test matches" > > which seems to be implying that we're failing on semantic consistency. > Basically, your 'cat' command is completing and closing the file, but then > a > temporally later open of the same find is reading something other that > only the > just-written data. My first guess is that it's a race condition similar > to the > following: The cat command is causing a write on one NSD server, and the > first > grep results in a read from a *different* NSD server, returning the data > that > *used* to be in the block because the read actually happens before the > first > NSD server actually completes the write. > > It may be interesting to replace the grep's with pairs of 'ls -ls / dd' > commands to grab the > raw data and its size, and check the following: > > 1) does the size (both blocks allocated and logical length) reported by > ls match the amount of data actually read by the dd? > > 2) Is the file length as actually read equal to the written length, or > does it > overshoot and read all the way to the next block boundary? > > 3) If the length is correct, what's wrong with the data that's telling > grep that > it's a binary file? ( od -cx is your friend here). > > 4) If it overshoots, is the remainder all-zeros (good) or does it return > semi-random > "what used to be there" data (bad, due to data exposure issues)? > > (It's certainly not the most perplexing data consistency issue I've hit in > 4 decades - the > winner *has* to be a intermittent data read corruption on a GPFS 3.5 > cluster that > had us, IBM, SGI, DDN, and at least one vendor of networking gear all > chasing our > tails for 18 months before we finally tracked it down. :) > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss