Re: [Gluster-users] VM going down

Niels de Vos Thu, 11 May 2017 08:19:28 -0700

On Thu, May 11, 2017 at 06:05:59PM +0530, Ravishankar N wrote:
> On 05/11/2017 05:49 PM, Niels de Vos wrote:
> > On Wed, May 10, 2017 at 09:08:03PM +0530, Pranith Kumar Karampuri wrote:
> > > On Wed, May 10, 2017 at 7:11 PM, Niels de Vos <[email protected]> wrote:
> > > 
> > > > On Wed, May 10, 2017 at 04:08:22PM +0530, Pranith Kumar Karampuri wrote:
> > > > > On Tue, May 9, 2017 at 7:40 PM, Niels de Vos <[email protected]> 
> > > > > wrote:
> > > > > 
> > > > > > ...
> > > > > > > > client from
> > > > > > > > srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
> > > > > > > > (version: 3.8.11)
> > > > > > > > [2017-05-08 10:01:06.237433] E [MSGID: 113107]
> > > > > > [posix.c:1079:posix_seek]
> > > > > > > > 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
> > > > such
> > > > > > > > device or address]
> > > > > > The SEEK procedure translates to lseek() in the posix xlator. This 
> > > > > > can
> > > > > > return with "No suck device or address" (ENXIO) in only one case:
> > > > > > 
> > > > > >      ENXIO    whence is SEEK_DATA or SEEK_HOLE, and the file offset 
> > > > > > is
> > > > > >               beyond the end of the file.
> > > > > > 
> > > > > > This means that an lseek() was executed where the current offset of 
> > > > > > the
> > > > > > filedescriptor was higher than the size of the file. I'm not sure 
> > > > > > how
> > > > > > that could happen... Sharding prevents using SEEK at all atm.
> > > > > > 
> > > > > > ...
> > > > > > > > The strange part is that I cannot seem to find any other error.
> > > > > > > > If I restart the VM everything works as expected (it stopped at
> > > > ~9.51
> > > > > > > > UTC and was started at ~10.01 UTC) .
> > > > > > > > 
> > > > > > > > This is not the first time that this happened, and I do not see 
> > > > > > > > any
> > > > > > > > problems with networking or the hosts.
> > > > > > > > 
> > > > > > > > Gluster version is 3.8.11
> > > > > > > > this is the incriminated volume (though it happened on a 
> > > > > > > > different
> > > > one
> > > > > > too)
> > > > > > > > Volume Name: datastore2
> > > > > > > > Type: Replicate
> > > > > > > > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> > > > > > > > Status: Started
> > > > > > > > Snapshot Count: 0
> > > > > > > > Number of Bricks: 1 x (2 + 1) = 3
> > > > > > > > Transport-type: tcp
> > > > > > > > Bricks:
> > > > > > > > Brick1: srvpve2g:/data/brick2/brick
> > > > > > > > Brick2: srvpve3g:/data/brick2/brick
> > > > > > > > Brick3: srvpve1g:/data/brick2/brick (arbiter)
> > > > > > > > Options Reconfigured:
> > > > > > > > nfs.disable: on
> > > > > > > > performance.readdir-ahead: on
> > > > > > > > transport.address-family: inet
> > > > > > > > 
> > > > > > > > Any hint on how to dig more deeply into the reason would be 
> > > > > > > > greatly
> > > > > > > > appreciated.
> > > > > > Probably the problem is with SEEK support in the arbiter 
> > > > > > functionality.
> > > > > > Just like with a READ or a WRITE on the arbiter brick, SEEK can only
> > > > > > succeed on bricks where the files with content are located. It does 
> > > > > > not
> > > > > > look like arbiter handles SEEK, so the offset in lseek() will 
> > > > > > likely be
> > > > > > higher than the size of the file on the brick (empty, 0 size file). 
> > > > > > I
> > > > > > don't know how the replication xlator responds on an error return 
> > > > > > from
> > > > > > SEEK on one of the bricks, but I doubt it likes it.
> > > > > > 
> > > > > inode-read fops don't get sent to arbiter brick. So this won't happen.
> > > > Yes, I see that the arbiter xlator returns on reads without going to the
> > > > bricks. Should that not be done for seek as well? It's the first time I
> > > > actually looked at the code of the arbiter xlator, so I might well be
> > > > misunderstanding how it works :)
> > > > 
> > > inode-read fops are the fops which read some information from the inode.
> > > Like stat/getxattr/read. Even seek falls in that category. It is not sent
> > > on arbiter brick...
> > What confuses me is that the arbiter xlator defines the following FOPs
> > in xlators/features/arbiter/src/arbiter.c:
> AFR has a list of readable subvols on which all read related FOPS are wound.
> For arbiter volumes, we mark the arbiter as non-readable during lookup cbk.
> So any read FOP is not wound to arbiter anymore. This change was made at a
> later stage after arbiter_readv was  coded initially to send an error. So in
> the current code, arbiter_readv should never get hit.


Aha! Thanks, that explains it well.

> >      struct xlator_fops fops = {
> >              .lookup = arbiter_lookup,
> >              .readv  = arbiter_readv,
> >              .truncate = arbiter_truncate,
> >              .writev = arbiter_writev,
> >              .ftruncate = arbiter_ftruncate,
> >              .fallocate = arbiter_fallocate,
> >              .discard = arbiter_discard,
> >              .zerofill = arbiter_zerofill,
> >      };
> > 
> > 
> > To go back to the error message:
> > 
> >    [posix.c:1079:posix_seek] 0-datastore2-posix: seek failed on fd 18 
> > length 42957209600 [No such device or address]
> > 
> > We need to know on which brick this occurs to confirm that is was not
> > sent on the arbiter brick somehow.
> 
> This is what Alessandro said earlier in the thread:
> 
> "Also the seek errors where there before when there was no arbiter (only 2
> replica)."

Ok, I missed that detail. We then just need to figure out why QEMU and
FUSE try to do an lseek() with an offset of 42957209600 while the file
is not that large...

Any ideas how that can happen?

Niels

signature.asc
Description: PGP signature

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] VM going down

Reply via email to