Re: [gpfsug-discuss] Odd behavior with cat followed by grep.

Bryan Banister Wed, 14 Feb 2018 10:49:00 -0800

Hi all,

We found this a while back and IBM fixed it.  Here’s your answer:  
http://www-01.ibm.com/support/docview.wss?uid=isg1IV87385


Cheers,
-Bryan

From: [email protected] 
[mailto:[email protected]] On Behalf Of John Hanks
Sent: Wednesday, February 14, 2018 12:31 PM
To: gpfsug main discussion list <[email protected]>
Subject: Re: [gpfsug-discuss] Odd behavior with cat followed by grep.

Note: External Email
________________________________
Straces are interesting, but don't immediately open my eyes:

strace of grep on NFS (works as expected)

openat(AT_FDCWD, "/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0
ioctl(3, TCGETS, 0x7ffe2c26b0b0)        = -1 ENOTTY (Inappropriate ioctl for 
device)
read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768
lseek(3, 32768, SEEK_HOLE)              = 530721
lseek(3, 32768, SEEK_SET)               = 32768
fstat(1, {st_mode=S_IFREG|0644, st_size=5977, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7f3bf6c43000
write(1, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 8192chr1

strace on GPFS (thinks file is binary)

openat(AT_FDCWD, "/srv/gsfs0/projects/pipetest.tmp.txt", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0
ioctl(3, TCGETS, 0x7ffc9b52caa0)        = -1 ENOTTY (Inappropriate ioctl for 
device)
read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768
lseek(3, 32768, SEEK_HOLE)              = 262144
lseek(3, 32768, SEEK_SET)               = 32768
fstat(1, {st_mode=S_IFREG|0644, st_size=6011, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fd45ee88000
close(3)                                = 0
write(1, "Binary file /srv/gsfs0/projects/"..., 72Binary file 
/srv/gsfs0/projects/levinson/xwzhu/pipetest.tmp.txt matches
) = 72

Do the lseek() results indicate that the grep on the GPFS mounted version 
thinks the file is a sparse file? For comparison I strace'd md5sum in place of 
the grep and it does not lseek() with SEEK_HOLE, it's access in both cases look 
identical, like:

open("/home/griznog/pipetest.tmp.txt", O_RDONLY) = 3
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=530721, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 
0x7fb7d2c2b000
read(3, "chr1\t43452652\t43452652\tL1HS\tlib1"..., 32768) = 32768
...[reads clipped]...
read(3, "", 24576)                      = 0
lseek(3, 0, SEEK_CUR)                   = 530721
close(3)                                = 0


jbh


On Wed, Feb 14, 2018 at 9:51 AM, Aaron Knister 
<[email protected]<mailto:[email protected]>> wrote:
Just speculating here (also known as making things up) but I wonder if
grep is somehow using the file's size in its determination of binary
status. I also see mmap in the strace so maybe there's some issue with
mmap where some internal GPFS buffer is getting truncated
inappropriately but leaving a bunch of null values which gets returned
to grep.

-Aaron

On 2/14/18 10:21 AM, John Hanks wrote:
> Hi Valdis,
>
> I tired with the grep replaced with 'ls -ls' and 'md5sum', I don't think
> this is a data integrity issue, thankfully:
>
> $ ./pipetestls.sh
> 256 -rw-r--r-- 1 39073 3001 530721 Feb 14 07:16
> /srv/gsfs0/projects/pipetest.tmp.txt
> 0 -rw-r--r-- 1 39073 3953 530721 Feb 14 07:16 /home/griznog/pipetest.tmp.txt
>
> $ ./pipetestmd5.sh
> 15cb81a85c9e450bdac8230309453a0a  /srv/gsfs0/projects/pipetest.tmp.txt
> 15cb81a85c9e450bdac8230309453a0a  /home/griznog/pipetest.tmp.txt
>
> And replacing grep with 'file' even properly sees the files as ASCII:
> $ ./pipetestfile.sh
> /srv/gsfs0/projects/pipetest.tmp.txt: ASCII text, with very long lines
> /home/griznog/pipetest.tmp.txt: ASCII text, with very long lines
>
> I'll poke a little harder at grep next and see what the difference in
> strace of each reveals.
>
> Thanks,
>
> jbh
>
>
>
>
> On Wed, Feb 14, 2018 at 7:08 AM, 
> <[email protected]<mailto:[email protected]>
> <mailto:[email protected]<mailto:[email protected]>>> wrote:
>
>     On Wed, 14 Feb 2018 06:20:32 -0800, John Hanks said:
>
>     > #  ls -aln /srv/gsfs0/projects/pipetest.tmp.txt $HOME/pipetest.tmp.txt
>     > -rw-r--r-- 1 39073 3953 530721 Feb 14 06:10 
> /home/griznog/pipetest.tmp.txt
>     > -rw-r--r-- 1 39073 3001 530721 Feb 14 06:10
>     > /srv/gsfs0/projects/pipetest.tmp.txt
>     >
>     > We can "fix" the user case that exposed this by not using a temp file or
>     > inserting a sleep, but I'd still like to know why GPFS is behaving this 
> way
>     > and make it stop.
>
>     May be related to replication, or other behind-the-scenes behavior.
>
>     Consider this example - 4.2.3.6, data and metadata replication both
>     set to 2, 2 sites 95 cable miles apart, each is 3 Dell servers with
>     a full
>     fiberchannel mesh to 3 Dell MD34something arrays.
>
>     % dd if=/dev/zero bs=1k count=4096 of=sync.test; ls -ls sync.test;
>     sleep 5; ls -ls sync.test; sleep 5; ls -ls sync.test
>     4096+0 records in
>     4096+0 records out
>     4194304 bytes (4.2 MB) copied, 0.0342852 s, 122 MB/s
>     2048 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test
>     8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test
>     8192 -rw-r--r-- 1 root root 4194304 Feb 14 09:35 sync.test
>
>     Notice that the first /bin/ls shouldn't be starting until after the
>     dd has
>     completed - at which point it's only allocated half the blocks
>     needed to hold
>     the 4M of data at one site.  5 seconds later, it's allocated the
>     blocks at both
>     sites and thus shows the full 8M needed for 2 copies.
>
>     I've also seen (but haven't replicated it as I write this) a small
>     file (4-8K
>     or so) showing first one full-sized block, then a second full-sized
>     block, and
>     then dropping back to what's needed for 2 1/32nd fragments.  That had me
>     scratching my head
>
>     Having said that, that's all metadata fun and games, while your case
>     appears to have some problems with data integrity (which is a whole lot
>     scarier).  It would be *really* nice if we understood the problem here.
>
>     The scariest part is:
>
>     > The first grep | wc -l returns 1, because grep outputs  "Binary file 
> /path/to/
>     > gpfs/mount/test matches"
>
>     which seems to be implying that we're failing on semantic consistency.
>     Basically, your 'cat' command is completing and closing the file,
>     but then a
>     temporally later open of the same find is reading something other
>     that only the
>     just-written data.  My first guess is that it's a race condition
>     similar to the
>     following: The cat command is causing a write on one NSD server, and
>     the first
>     grep results in a read from a *different* NSD server, returning the
>     data that
>     *used* to be in the block because the read actually happens before
>     the first
>     NSD server actually completes the write.
>
>     It may be interesting to replace the grep's with pairs of 'ls -ls /
>     dd' commands to grab the
>     raw data and its size, and check the following:
>
>     1) does the size (both blocks allocated and logical length) reported by
>     ls match the amount of data actually read by the dd?
>
>     2) Is the file length as actually read equal to the written length,
>     or does it
>     overshoot and read all the way to the next block boundary?
>
>     3) If the length is correct, what's wrong with the data that's
>     telling grep that
>     it's a binary file?  ( od -cx is your friend here).
>
>     4) If it overshoots, is the remainder all-zeros (good) or does it
>     return semi-random
>     "what used to be there" data (bad, due to data exposure issues)?
>
>     (It's certainly not the most perplexing data consistency issue I've
>     hit in 4 decades - the
>     winner *has* to be a intermittent data read corruption on a GPFS 3.5
>     cluster that
>     had us, IBM, SGI, DDN, and at least one vendor of networking gear
>     all chasing our
>     tails for 18 months before we finally tracked it down. :)
>
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> 
> <http://spectrumscale.org>
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>     <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776<tel:%28301%29%20286-2776>
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


________________________________

Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments. Email 
transmission cannot be guaranteed to be secure or error-free. The Company, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments. This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] Odd behavior with cat followed by grep.

Reply via email to