[Gluster-users] Concurrent writes to same file

frank.gruellich Thu, 25 Jul 2013 00:29:52 -0700

Hi,

we're trying to setup a web caching application on top of GlusterFS which is 
supposed to provide shared storage for the a bunch of servers (2 right now).  
We have a similar setup running on NFS which works okay.  GlusterFS however is 
giving us some headaches.


We use two servers ($GLUFS001 and $GLUFS002) in AWS both having their brick FS 
mounted at /mnt/brick01/ with a vol01/ used as brick.  We tried XFS and EXT4 
for the underlying FS.  GlusterFS is setup with:

sudo gluster volume create tst_vol_aws01 replica 2 transport tcp 
"$GLUFS001:/mnt/brick01/vol01/" "$GLUFS002:/mnt/brick01/vol01/"

That's my volume info at the end:

Volume Name: tst_vol_aws01
Type: Replicate
Volume ID: f4b060fa-d6a8-45d2-89c5-1d99637dcc8a
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: $GLUFS001:/mnt/brick01/vol01
Brick2: $GLUFS002:/mnt/brick01/vol01
Options Reconfigured:
performance.cache-size: 256MB

The clients are running on the very same machines mounting with glusterfs:

sudo mount -t glusterfs localhost:/tst_vol_aws01 /mnt/tst_vol_aws

which is used as storage for the web cache application.  This fetches misses 
from an origin and places it into the storage to use it next time and also make 
it available to the other web cache.  It does so by writing a file to a 
temporary, per client unique location (eg. /mnt/tst_vol_aws/tmp/$IPADDR/) and 
then moving the file into its final location (below 
/mnt/tst_vol_aws/sat_p_cache/).

As long as I send requests to only one of the instances everything is fine.  
The trouble comes when I send requests to both of them and cause concurrent 
writes to the storage.  Though unlikely, it can happen that two clients try to 
put the very same file into their cache.  Needless to say that they fetch the 
same content from origin, so if one overwrites the others version it is not a 
problem. The error our application receives is:

pread() "/mnt/tst_vol_aws/sat_p_cache/9_158_309/18/169/80930_158240" failed 
(22: Invalid argument)

Around that I find various messages in /var/log/glusterfs/mnt-tst_vol_aws.log:

[2013-07-25 07:08:36.859467] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 
0-tst_vol_aws01-client-1: remote operation failed: St
ale file handle. Path: /sat_p_cache/9_158_309/19/682/161796_316418 
(c19f0726-2c01-420f-b10f-926f8f55de43)
[2013-07-25 07:08:36.860412] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 
0-tst_vol_aws01-client-0: remote operation failed: St
ale file handle. Path: /sat_p_cache/9_158_309/19/682/161796_316418 
(c19f0726-2c01-420f-b10f-926f8f55de43)
[2013-07-25 07:09:44.035227] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 
0-tst_vol_aws01-client-1: remote operation failed: St
ale file handle. Path: /sat_p_cache/9_158_309 
(3628f18c-48c8-4b07-91cf-ff1922ab45f8)
[2013-07-25 07:09:44.035669] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 
0-tst_vol_aws01-client-0: remote operation failed: St
ale file handle. Path: /sat_p_cache/9_158_309 
(3628f18c-48c8-4b07-91cf-ff1922ab45f8)
[2013-07-25 07:09:44.885802] I 
[afr-self-heal-common.c:1213:afr_sh_missing_entry_call_impunge_recreate] 
0-tst_vol_aws01-replicate-0: no missing files - 
/sat_p_cache/9_158_309/18/169/80930_158240. proceeding to metadata check
[2013-07-25 07:09:44.892560] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 
0-tst_vol_aws01-client-1: remote operation failed: File exists. Path: 
/sat_p_cache/9_158_309/18/169/80930_158240 
(00000000-0000-0000-0000-000000000000)
[2013-07-25 07:09:44.892994] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 
0-tst_vol_aws01-client-0: remote operation failed: File exists. Path: 
/sat_p_cache/9_158_309/18/169/80930_158240 
(00000000-0000-0000-0000-000000000000)
[2013-07-25 07:09:44.893073] E [dht-helper.c:1052:dht_inode_ctx_get] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75)
 [0x7f541cd61025] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e)
 [0x7f541cd4801e] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34)
 [0x7f541cd49364]))) 0-tst_vol_aws01-dht: invalid argument: inode
[2013-07-25 07:09:44.893106] E [dht-helper.c:1071:dht_inode_ctx_set] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75)
 [0x7f541cd61025] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e)
 [0x7f541cd4801e] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52)
 [0x7f541cd49382]))) 0-tst_vol_aws01-dht: invalid argument: inode
[2013-07-25 07:09:44.893476] W [page.c:991:__ioc_page_error] 
0-tst_vol_aws01-io-cache: page error for page = 0x7f54100556c0 & waitq = 
0x7f5410048b50
[2013-07-25 07:09:44.893527] W [fuse-bridge.c:2049:fuse_readv_cbk] 
0-glusterfs-fuse: 5663: READ => -1 (Invalid argument)
[2013-07-25 07:09:44.896395] W [fuse-bridge.c:875:fuse_fd_cbk] 
0-glusterfs-fuse: 5665: OPEN() /sat_p_cache/9_158_309/18/169/80930_158240 => -1 
(Invalid argument)
[2013-07-25 07:09:44.896524] W [defaults.c:1291:default_release] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/performance/open-behind.so(open_and_resume+0xa8)
 [0x7f541c30c288] (-->/usr/lib64/libglusterfs.so.0(call_stub_destroy+0x142) 
[0x7f54231cbbb2] (-->/usr/lib64/libglusterfs.so.0(fd_unref+0x144) 
[0x7f54231d8604]))) 0-fuse: xlator does not implement release_cbk
[2013-07-25 07:09:45.381747] I 
[afr-self-heal-common.c:1213:afr_sh_missing_entry_call_impunge_recreate] 
0-tst_vol_aws01-replicate-0: no missing files - 
/sat_p_cache/9_158_309/19/674/161796_316486. proceeding to metadata check

I think the one causing the message to us is this "invalid argument: inode", 
but also the other are not exactly encouraging.  I can easily reproduce the 
problem without our service by this simple shell script:

#!/bin/sh

ip="`/sbin/ip a s eth0 |sed -n '/inet /{s%^.*inet %%;s%/.*$%%;p}'`"

tmpdir="/mnt/tst_vol_aws/tmp/$ip/foo/"
dstdir="/mnt/tst_vol_aws/sat_p_cache/foo/"

while true
do
        date >"$tmpdir/foo.$$"
        mv -v "$tmpdir/foo.$$" "$dstdir/foo.`date +%s`"
done

The clients get alternating:

mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure 
needs cleaning
mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure 
needs cleaning
mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure 
needs cleaning
mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure 
needs cleaning
mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure 
needs cleaning
mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure 
needs cleaning
mv: accessing `/mnt/tst_vol_aws/sat_p_cache/foo//foo.1374736742': Structure 
needs cleaning

errors and I get the same garbage in gluster logfiles:

[2013-07-25 07:19:00.018670] E 
[afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 
0-tst_vol_aws01-replicate-0: background  meta-data data entry missing-entry 
gfid self-heal failed on /sat_p_cache/foo/foo.1374736740
[2013-07-25 07:19:00.019713] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 
0-tst_vol_aws01-client-0: remote operation failed: File exists. Path: 
/sat_p_cache/foo/foo.1374736740 (00000000-0000-0000-0000-000000000000)
[2013-07-25 07:19:00.020364] W [client-rpc-fops.c:259:client3_3_mknod_cbk] 
0-tst_vol_aws01-client-1: remote operation failed: File exists. Path: 
/sat_p_cache/foo/foo.1374736740 (00000000-0000-0000-0000-000000000000)
[2013-07-25 07:19:00.020440] E [dht-helper.c:1052:dht_inode_ctx_get] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75)
 [0x7f73242c0025] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e)
 [0x7f73242a701e] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x34)
 [0x7f73242a8364]))) 0-tst_vol_aws01-dht: invalid argument: inode
[2013-07-25 07:19:00.020472] E [dht-helper.c:1071:dht_inode_ctx_set] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_lookup_linkfile_create_cbk+0x75)
 [0x7f73242c0025] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_layout_preset+0x5e)
 [0x7f73242a701e] 
(-->/usr/lib64/glusterfs/3.4.0/xlator/cluster/distribute.so(dht_inode_ctx_layout_set+0x52)
 [0x7f73242a8382]))) 0-tst_vol_aws01-dht: invalid argument: inode
[2013-07-25 07:19:00.172323] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 
0-tst_vol_aws01-replicate-0: /sat_p_cache/foo: Performing conservative merge
[2013-07-25 07:19:01.068257] W [client-rpc-fops.c:526:client3_3_stat_cbk] 
0-tst_vol_aws01-client-0: remote operation failed: No such file or directory

Any advice on this?  What am I doing wrong?  If required I can provide an easy 
shell script how to set this up in AWS to reproduce the problem.  Thanks a lot 
in advance.

Kind regards,
 Frank.
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Concurrent writes to same file

Reply via email to