Just to be sure: why do you guys create an updated version of glusterfs package for wheezy, if it is not able to install it on wheezy? :)
2014-08-08 9:03 GMT+03:00 Roman <[email protected]>: > Oh, unfortunately I won't be able to install 3.5.2 nor 3.4.5 :( They both > require libc6 update. I would not risk that way. > > glusterfs-common : Depends: libc6 (>= 2.14) but 2.13-38+deb7u3 is to be > installed > Depends: liblvm2app2.2 (>= 2.02.106) but 2.02.95-8 is > to be installed > Depends: librdmacm1 (>= 1.0.16) but 1.0.15-1+deb7u1 is > to be installed > > > > 2014-08-07 15:32 GMT+03:00 Roman <[email protected]>: > >> I'm really sorry to bother, but it seems like all my previous test were >> waste of time with those generated from /dev/zero files :). Its good and >> bad news. Now I use real files for my tests. As it my almost last workday, >> only things I prefer to do is to test and document :) .. so here are some >> new results: >> >> So this time I've got two gluster volumes: >> >> 1. with cluster.self-heal-daemon off >> 2. with cluster.self-heal-daemon on >> >> 1. real results with SHD off: >> Seems like all is working as expected. VM survives both glusterfs servers >> outage. And I'm able to see the sync via network traffic. FINE! >> >> Sometimes healing occurs a bit late (takes time from 1 minute to 1 hour >> to sync). Don't know why. Ideas? >> >> 2. test results on server with SHD on: >> VM is not able to survive second server restart (as was previously >> defined). gives IO errors, Although files are synced. Some locks, that do >> not allow KVM hypervisor to reconnect to the storage in time? >> >> >> So the problem actually is stripped files inside a VM :). If one uses >> them (generates from /dev/zero ie), VM will crash and never come up due to >> errors in qcow2 file headers. Another bug? >> >> >> >> >> >> >> >> 2014-08-07 9:53 GMT+03:00 Roman <[email protected]>: >> >>> Ok, then I hope that we will be able to test it two weeks later. Thanks >>> for your time and patience. >>> >>> >>> 2014-08-07 9:49 GMT+03:00 Pranith Kumar Karampuri <[email protected]>: >>> >>>> >>>> On 08/07/2014 12:17 PM, Roman wrote: >>>> >>>> Well, one thing is definitely true: If there is no healing daemon >>>> running, I'm not able to start the VM after outage. Seems like the qcow2 >>>> file is corrupted (KVM unable to read its header). >>>> >>>> We shall see this again once I have the document with all the steps >>>> that need to be carried out :-) >>>> >>>> Pranith >>>> >>>> >>>> >>>> 2014-08-07 9:35 GMT+03:00 Roman <[email protected]>: >>>> >>>>> > This should not happen if you do the writes lets say from >>>>> '/dev/urandom' instead of '/dev/zero' >>>>> >>>>> Somewhere deep inside me I thought so ! zero is zero :) >>>>> >>>>> >I will provide you with a document for testing this issue properly. >>>>> I have a lot going on in my day job so not getting enough time to write >>>>> that out. Considering the weekend is approaching I will > get a bit of >>>>> time >>>>> definitely over the weekend so I will send you the document over the >>>>> weekend. >>>>> >>>>> Thank you a lot. I'll wait. Tomorrow starts my vacation and I'll be >>>>> out for two weeks, so don't hurry very much. >>>>> >>>>> >>>>> >>>>> >>>>> 2014-08-07 9:26 GMT+03:00 Pranith Kumar Karampuri < >>>>> [email protected]>: >>>>> >>>>>> >>>>>> On 08/07/2014 11:48 AM, Roman wrote: >>>>>> >>>>>> How can they be in sync, if they are different in size ? And why then >>>>>> VM is not able to survive gluster outage? I really want to use glusterfs >>>>>> in >>>>>> our production for infrastructure virtualization due to its simple setup, >>>>>> but I'm not able to at this moment. Maybe you've got some testing agenda? >>>>>> Or could you list me the steps to make right tests, so our VM-s would >>>>>> survive the outages. >>>>>> >>>>>> This is because of sparse files. >>>>>> http://en.wikipedia.org/wiki/Sparse_file >>>>>> This should not happen if you do the writes lets say from >>>>>> '/dev/urandom' instead of '/dev/zero' >>>>>> >>>>>> I will provide you with a document for testing this issue properly. I >>>>>> have a lot going on in my day job so not getting enough time to write >>>>>> that >>>>>> out. Considering the weekend is approaching I will get a bit of time >>>>>> definitely over the weekend so I will send you the document over the >>>>>> weekend. >>>>>> >>>>>> Pranith >>>>>> >>>>>> >>>>>> We would like to be sure, that in situation, when one of storages >>>>>> is down, the VM-s are running - it is OK, we see this. >>>>>> We would like to be sure, that data after the server is back up is >>>>>> synced - we can't see that atm >>>>>> We would like to be sure, that VMs are failovering to the second >>>>>> storage during the outage - we can't see this atm >>>>>> :( >>>>>> >>>>>> >>>>>> 2014-08-07 9:12 GMT+03:00 Pranith Kumar Karampuri < >>>>>> [email protected]>: >>>>>> >>>>>>> >>>>>>> On 08/07/2014 11:33 AM, Roman wrote: >>>>>>> >>>>>>> File size increases because of me :) I generate files on VM from >>>>>>> /dev/zero during the outage of one server. Then I bring up the downed >>>>>>> server and it seems files never sync. I'll keep on testing today. Can't >>>>>>> read much from logs also :(. This morning both VM-s (one on volume with >>>>>>> self-healing and other on volume without it) survived second server >>>>>>> outage >>>>>>> (first server was down yesterday), while file sizes are different, VM-s >>>>>>> ran >>>>>>> without problems. But I've restarted them before bringing the second >>>>>>> gluster server down. >>>>>>> >>>>>>> Then there is no bug :-). It seems the files are already in sync >>>>>>> according to the extended attributes you have pasted. How to do you >>>>>>> test if >>>>>>> the files are in sync or not? >>>>>>> >>>>>>> Pranith >>>>>>> >>>>>>> >>>>>>> So I'm a bit lost at this moment. I'll try to keep my testings >>>>>>> ordered and write here, what will happen. >>>>>>> >>>>>>> >>>>>>> 2014-08-07 8:29 GMT+03:00 Pranith Kumar Karampuri < >>>>>>> [email protected]>: >>>>>>> >>>>>>>> >>>>>>>> On 08/07/2014 10:46 AM, Roman wrote: >>>>>>>> >>>>>>>> yes, they do. >>>>>>>> >>>>>>>> getfattr: Removing leading '/' from absolute path names >>>>>>>> # file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000 >>>>>>>> trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000 >>>>>>>> trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa >>>>>>>> >>>>>>>> root@stor1:~# du -sh >>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> 1.6G /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> root@stor1:~# md5sum >>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> c117d73c9f8a2e09ef13da31f7225fa6 >>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> root@stor1:~# du -sh >>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> 1.6G /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> root@stor2:~# getfattr -d -m. -e hex >>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> getfattr: Removing leading '/' from absolute path names >>>>>>>> # file: exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> trusted.afr.HA-MED-PVE1-1T-client-0=0x000000000000000000000000 >>>>>>>> trusted.afr.HA-MED-PVE1-1T-client-1=0x000000000000000000000000 >>>>>>>> trusted.gfid=0x207984df4e6e4ef983f285ed0c4ce8fa >>>>>>>> >>>>>>>> root@stor2:~# md5sum >>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> c117d73c9f8a2e09ef13da31f7225fa6 >>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> root@stor2:~# du -sh >>>>>>>> /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> 2.6G /exports/pve1/1T/images/125/vm-125-disk-1.qcow2 >>>>>>>> >>>>>>>> I think the files are differing in size because of the sparse file >>>>>>>> healing issue. Could you raise a bug with steps to re-create this issue >>>>>>>> where after healing size of the file is increasing? >>>>>>>> >>>>>>>> Pranith >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2014-08-06 12:49 GMT+03:00 Humble Chirammal <[email protected]>: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ----- Original Message ----- >>>>>>>>> | From: "Pranith Kumar Karampuri" <[email protected]> >>>>>>>>> | To: "Roman" <[email protected]> >>>>>>>>> | Cc: [email protected], "Niels de Vos" <[email protected]>, >>>>>>>>> "Humble Chirammal" <[email protected]> >>>>>>>>> | Sent: Wednesday, August 6, 2014 12:09:57 PM >>>>>>>>> | Subject: Re: [Gluster-users] libgfapi failover problem on >>>>>>>>> replica bricks >>>>>>>>> | >>>>>>>>> | Roman, >>>>>>>>> | The file went into split-brain. I think we should do these >>>>>>>>> tests >>>>>>>>> | with 3.5.2. Where monitoring the heals is easier. Let me also >>>>>>>>> come up >>>>>>>>> | with a document about how to do this testing you are trying to >>>>>>>>> do. >>>>>>>>> | >>>>>>>>> | Humble/Niels, >>>>>>>>> | Do we have debs available for 3.5.2? In 3.5.1 there was >>>>>>>>> packaging >>>>>>>>> | issue where /usr/bin/glfsheal is not packaged along with the >>>>>>>>> deb. I >>>>>>>>> | think that should be fixed now as well? >>>>>>>>> | >>>>>>>>> Pranith, >>>>>>>>> >>>>>>>>> The 3.5.2 packages for debian is not available yet. We are >>>>>>>>> co-ordinating internally to get it processed. >>>>>>>>> I will update the list once its available. >>>>>>>>> >>>>>>>>> --Humble >>>>>>>>> | >>>>>>>>> | On 08/06/2014 11:52 AM, Roman wrote: >>>>>>>>> | > good morning, >>>>>>>>> | > >>>>>>>>> | > root@stor1:~# getfattr -d -m. -e hex >>>>>>>>> | > /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 >>>>>>>>> | > getfattr: Removing leading '/' from absolute path names >>>>>>>>> | > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 >>>>>>>>> | > >>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000 >>>>>>>>> | > >>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000 >>>>>>>>> | > trusted.gfid=0x23c79523075a4158bea38078da570449 >>>>>>>>> | > >>>>>>>>> | > getfattr: Removing leading '/' from absolute path names >>>>>>>>> | > # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 >>>>>>>>> | > >>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000 >>>>>>>>> | > >>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000 >>>>>>>>> | > trusted.gfid=0x23c79523075a4158bea38078da570449 >>>>>>>>> | > >>>>>>>>> | > >>>>>>>>> | > >>>>>>>>> | > 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri < >>>>>>>>> [email protected] >>>>>>>>> | > <mailto:[email protected]>>: >>>>>>>>> | > >>>>>>>>> | > >>>>>>>>> | > On 08/06/2014 11:30 AM, Roman wrote: >>>>>>>>> | >> Also, this time files are not the same! >>>>>>>>> | >> >>>>>>>>> | >> root@stor1:~# md5sum >>>>>>>>> | >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 >>>>>>>>> | >> 32411360c53116b96a059f17306caeda >>>>>>>>> | >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 >>>>>>>>> | >> >>>>>>>>> | >> root@stor2:~# md5sum >>>>>>>>> | >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 >>>>>>>>> | >> 65b8a6031bcb6f5fb3a11cb1e8b1c9c9 >>>>>>>>> | >> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 >>>>>>>>> | > What is the getfattr output? >>>>>>>>> | > >>>>>>>>> | > Pranith >>>>>>>>> | > >>>>>>>>> | >> >>>>>>>>> | >> >>>>>>>>> | >> 2014-08-05 16:33 GMT+03:00 Roman <[email protected] >>>>>>>>> | >> <mailto:[email protected]>>: >>>>>>>>> | >> >>>>>>>>> | >> Nope, it is not working. But this time it went a bit >>>>>>>>> other way >>>>>>>>> | >> >>>>>>>>> | >> root@gluster-client:~# dmesg >>>>>>>>> | >> Segmentation fault >>>>>>>>> | >> >>>>>>>>> | >> >>>>>>>>> | >> I was not able even to start the VM after I done the >>>>>>>>> tests >>>>>>>>> | >> >>>>>>>>> | >> Could not read qcow2 header: Operation not permitted >>>>>>>>> | >> >>>>>>>>> | >> And it seems, it never starts to sync files after >>>>>>>>> first >>>>>>>>> | >> disconnect. VM survives first disconnect, but not >>>>>>>>> second (I >>>>>>>>> | >> waited around 30 minutes). Also, I've >>>>>>>>> | >> got network.ping-timeout: 2 in volume settings, but >>>>>>>>> logs >>>>>>>>> | >> react on first disconnect around 30 seconds. Second >>>>>>>>> was >>>>>>>>> | >> faster, 2 seconds. >>>>>>>>> | >> >>>>>>>>> | >> Reaction was different also: >>>>>>>>> | >> >>>>>>>>> | >> slower one: >>>>>>>>> | >> [2014-08-05 13:26:19.558435] W >>>>>>>>> [socket.c:514:__socket_rwv] >>>>>>>>> | >> 0-glusterfs: readv failed (Connection timed out) >>>>>>>>> | >> [2014-08-05 13:26:19.558485] W >>>>>>>>> | >> [socket.c:1962:__socket_proto_state_machine] >>>>>>>>> 0-glusterfs: >>>>>>>>> | >> reading from socket failed. Error (Connection timed >>>>>>>>> out), >>>>>>>>> | >> peer (10.250.0.1:24007 <http://10.250.0.1:24007>) >>>>>>>>> | >> [2014-08-05 13:26:21.281426] W >>>>>>>>> [socket.c:514:__socket_rwv] >>>>>>>>> | >> 0-HA-fast-150G-PVE1-client-0: readv failed >>>>>>>>> (Connection timed out) >>>>>>>>> | >> [2014-08-05 13:26:21.281474] W >>>>>>>>> | >> [socket.c:1962:__socket_proto_state_machine] >>>>>>>>> | >> 0-HA-fast-150G-PVE1-client-0: reading from socket >>>>>>>>> failed. >>>>>>>>> | >> Error (Connection timed out), peer (10.250.0.1:49153 >>>>>>>>> | >> <http://10.250.0.1:49153>) >>>>>>>>> | >> [2014-08-05 13:26:21.281507] I >>>>>>>>> | >> [client.c:2098:client_rpc_notify] >>>>>>>>> | >> 0-HA-fast-150G-PVE1-client-0: disconnected >>>>>>>>> | >> >>>>>>>>> | >> the fast one: >>>>>>>>> | >> 2014-08-05 12:52:44.607389] C >>>>>>>>> | >> [client-handshake.c:127:rpc_client_ping_timer_expired] >>>>>>>>> | >> 0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153 >>>>>>>>> | >> <http://10.250.0.2:49153> has not responded in the >>>>>>>>> last 2 >>>>>>>>> | >> seconds, disconnecting. >>>>>>>>> | >> [2014-08-05 12:52:44.607491] W >>>>>>>>> [socket.c:514:__socket_rwv] >>>>>>>>> | >> 0-HA-fast-150G-PVE1-client-1: readv failed (No data >>>>>>>>> available) >>>>>>>>> | >> [2014-08-05 12:52:44.607585] E >>>>>>>>> | >> [rpc-clnt.c:368:saved_frames_unwind] >>>>>>>>> | >> >>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8) >>>>>>>>> | >> [0x7fcb1b4b0558] >>>>>>>>> | >> >>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) >>>>>>>>> | >> [0x7fcb1b4aea63] >>>>>>>>> | >> >>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe) >>>>>>>>> | >> [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: >>>>>>>>> forced >>>>>>>>> | >> unwinding frame type(GlusterFS 3.3) op(LOOKUP(27)) >>>>>>>>> called at >>>>>>>>> | >> 2014-08-05 12:52:42.463881 (xid=0x381883x) >>>>>>>>> | >> [2014-08-05 12:52:44.607604] W >>>>>>>>> | >> [client-rpc-fops.c:2624:client3_3_lookup_cbk] >>>>>>>>> | >> 0-HA-fast-150G-PVE1-client-1: remote operation failed: >>>>>>>>> | >> Transport endpoint is not connected. Path: / >>>>>>>>> | >> (00000000-0000-0000-0000-000000000001) >>>>>>>>> | >> [2014-08-05 12:52:44.607736] E >>>>>>>>> | >> [rpc-clnt.c:368:saved_frames_unwind] >>>>>>>>> | >> >>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8) >>>>>>>>> | >> [0x7fcb1b4b0558] >>>>>>>>> | >> >>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3) >>>>>>>>> | >> [0x7fcb1b4aea63] >>>>>>>>> | >> >>>>>>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe) >>>>>>>>> | >> [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: >>>>>>>>> forced >>>>>>>>> | >> unwinding frame type(GlusterFS Handshake) op(PING(3)) >>>>>>>>> called >>>>>>>>> | >> at 2014-08-05 12:52:42.463891 (xid=0x381884x) >>>>>>>>> | >> [2014-08-05 12:52:44.607753] W >>>>>>>>> | >> [client-handshake.c:276:client_ping_cbk] >>>>>>>>> | >> 0-HA-fast-150G-PVE1-client-1: timer must have expired >>>>>>>>> | >> [2014-08-05 12:52:44.607776] I >>>>>>>>> | >> [client.c:2098:client_rpc_notify] >>>>>>>>> | >> 0-HA-fast-150G-PVE1-client-1: disconnected >>>>>>>>> | >> >>>>>>>>> | >> >>>>>>>>> | >> >>>>>>>>> | >> I've got SSD disks (just for an info). >>>>>>>>> | >> Should I go and give a try for 3.5.2? >>>>>>>>> | >> >>>>>>>>> | >> >>>>>>>>> | >> >>>>>>>>> | >> 2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri >>>>>>>>> | >> <[email protected] <mailto:[email protected]>>: >>>>>>>>> | >> >>>>>>>>> | >> reply along with gluster-users please :-). May be >>>>>>>>> you are >>>>>>>>> | >> hitting 'reply' instead of 'reply all'? >>>>>>>>> | >> >>>>>>>>> | >> Pranith >>>>>>>>> | >> >>>>>>>>> | >> On 08/05/2014 03:35 PM, Roman wrote: >>>>>>>>> | >>> To make sure and clean, I've created another VM >>>>>>>>> with raw >>>>>>>>> | >>> format and goint to repeat those steps. So now >>>>>>>>> I've got >>>>>>>>> | >>> two VM-s one with qcow2 format and other with raw >>>>>>>>> | >>> format. I will send another e-mail shortly. >>>>>>>>> | >>> >>>>>>>>> | >>> >>>>>>>>> | >>> 2014-08-05 13:01 GMT+03:00 Pranith Kumar >>>>>>>>> Karampuri >>>>>>>>> | >>> <[email protected] <mailto: >>>>>>>>> [email protected]>>: >>>>>>>>> | >>> >>>>>>>>> | >>> >>>>>>>>> | >>> On 08/05/2014 03:07 PM, Roman wrote: >>>>>>>>> | >>>> really, seems like the same file >>>>>>>>> | >>>> >>>>>>>>> | >>>> stor1: >>>>>>>>> | >>>> a951641c5230472929836f9fcede6b04 >>>>>>>>> | >>>> >>>>>>>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 >>>>>>>>> | >>>> >>>>>>>>> | >>>> stor2: >>>>>>>>> | >>>> a951641c5230472929836f9fcede6b04 >>>>>>>>> | >>>> >>>>>>>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 >>>>>>>>> | >>>> >>>>>>>>> | >>>> >>>>>>>>> | >>>> one thing I've seen from logs, that somehow >>>>>>>>> proxmox >>>>>>>>> | >>>> VE is connecting with wrong version to >>>>>>>>> servers? >>>>>>>>> | >>>> [2014-08-05 09:23:45.218550] I >>>>>>>>> | >>>> >>>>>>>>> [client-handshake.c:1659:select_server_supported_programs] >>>>>>>>> | >>>> 0-HA-fast-150G-PVE1-client-0: Using Program >>>>>>>>> | >>>> GlusterFS 3.3, Num (1298437), Version (330) >>>>>>>>> | >>> It is the rpc (over the network data >>>>>>>>> structures) >>>>>>>>> | >>> version, which is not changed at all from >>>>>>>>> 3.3 so >>>>>>>>> | >>> thats not a problem. So what is the >>>>>>>>> conclusion? Is >>>>>>>>> | >>> your test case working now or not? >>>>>>>>> | >>> >>>>>>>>> | >>> Pranith >>>>>>>>> | >>> >>>>>>>>> | >>>> but if I issue: >>>>>>>>> | >>>> root@pve1:~# glusterfs -V >>>>>>>>> | >>>> glusterfs 3.4.4 built on Jun 28 2014 >>>>>>>>> 03:44:57 >>>>>>>>> | >>>> seems ok. >>>>>>>>> | >>>> >>>>>>>>> | >>>> server use 3.4.4 meanwhile >>>>>>>>> | >>>> [2014-08-05 09:23:45.117875] I >>>>>>>>> | >>>> [server-handshake.c:567:server_setvolume] >>>>>>>>> | >>>> 0-HA-fast-150G-PVE1-server: accepted client >>>>>>>>> from >>>>>>>>> | >>>> >>>>>>>>> stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0 >>>>>>>>> | >>>> (version: 3.4.4) >>>>>>>>> | >>>> [2014-08-05 09:23:49.103035] I >>>>>>>>> | >>>> [server-handshake.c:567:server_setvolume] >>>>>>>>> | >>>> 0-HA-fast-150G-PVE1-server: accepted client >>>>>>>>> from >>>>>>>>> | >>>> >>>>>>>>> stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0 >>>>>>>>> | >>>> (version: 3.4.4) >>>>>>>>> | >>>> >>>>>>>>> | >>>> if this could be the reason, of course. >>>>>>>>> | >>>> I did restart the Proxmox VE yesterday >>>>>>>>> (just for an >>>>>>>>> | >>>> information) >>>>>>>>> | >>>> >>>>>>>>> | >>>> >>>>>>>>> | >>>> >>>>>>>>> | >>>> >>>>>>>>> | >>>> >>>>>>>>> | >>>> 2014-08-05 12:30 GMT+03:00 Pranith Kumar >>>>>>>>> Karampuri >>>>>>>>> | >>>> <[email protected] <mailto: >>>>>>>>> [email protected]>>: >>>>>>>>> | >>>> >>>>>>>>> | >>>> >>>>>>>>> | >>>> On 08/05/2014 02:33 PM, Roman wrote: >>>>>>>>> | >>>>> Waited long enough for now, still >>>>>>>>> different >>>>>>>>> | >>>>> sizes and no logs about healing :( >>>>>>>>> | >>>>> >>>>>>>>> | >>>>> stor1 >>>>>>>>> | >>>>> # file: >>>>>>>>> | >>>>> >>>>>>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 >>>>>>>>> | >>>>> >>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000 >>>>>>>>> | >>>>> >>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000 >>>>>>>>> | >>>>> >>>>>>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921 >>>>>>>>> | >>>>> >>>>>>>>> | >>>>> root@stor1:~# du -sh >>>>>>>>> | >>>>> /exports/fast-test/150G/images/127/ >>>>>>>>> | >>>>> 1.2G >>>>>>>>> /exports/fast-test/150G/images/127/ >>>>>>>>> | >>>>> >>>>>>>>> | >>>>> >>>>>>>>> | >>>>> stor2 >>>>>>>>> | >>>>> # file: >>>>>>>>> | >>>>> >>>>>>>>> exports/fast-test/150G/images/127/vm-127-disk-1.qcow2 >>>>>>>>> | >>>>> >>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000 >>>>>>>>> | >>>>> >>>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000 >>>>>>>>> | >>>>> >>>>>>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921 >>>>>>>>> | >>>>> >>>>>>>>> | >>>>> >>>>>>>>> | >>>>> root@stor2:~# du -sh >>>>>>>>> | >>>>> /exports/fast-test/150G/images/127/ >>>>>>>>> | >>>>> 1.4G >>>>>>>>> /exports/fast-test/150G/images/127/ >>>>>>>>> | >>>> According to the changelogs, the file >>>>>>>>> doesn't >>>>>>>>> | >>>> need any healing. Could you stop the >>>>>>>>> operations >>>>>>>>> | >>>> on the VMs and take md5sum on both >>>>>>>>> these machines? >>>>>>>>> | >>>> >>>>>>>>> | >>>> Pranith >>>>>>>>> | >>>> >>>>>>>>> | >>>>> >>>>>>>>> | >>>>> >>>>>>>>> | >>>>> >>>>>>>>> | >>>>> >>>>>>>>> | >>>>> 2014-08-05 11:49 GMT+03:00 Pranith >>>>>>>>> Kumar >>>>>>>>> | >>>>> Karampuri <[email protected] >>>>>>>>> | >>>>> <mailto:[email protected]>>: >>>>>>>>> | >>>>> >>>>>>>>> | >>>>> >>>>>>>>> | >>>>> On 08/05/2014 02:06 PM, Roman >>>>>>>>> wrote: >>>>>>>>> | >>>>>> Well, it seems like it doesn't >>>>>>>>> see the >>>>>>>>> | >>>>>> changes were made to the volume ? >>>>>>>>> I >>>>>>>>> | >>>>>> created two files 200 and 100 MB >>>>>>>>> (from >>>>>>>>> | >>>>>> /dev/zero) after I disconnected >>>>>>>>> the first >>>>>>>>> | >>>>>> brick. Then connected it back and >>>>>>>>> got >>>>>>>>> | >>>>>> these logs: >>>>>>>>> | >>>>>> >>>>>>>>> | >>>>>> [2014-08-05 08:30:37.830150] I >>>>>>>>> | >>>>>> >>>>>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] >>>>>>>>> | >>>>>> 0-glusterfs: No change in >>>>>>>>> volfile, continuing >>>>>>>>> | >>>>>> [2014-08-05 08:30:37.830207] I >>>>>>>>> | >>>>>> >>>>>>>>> [rpc-clnt.c:1676:rpc_clnt_reconfig] >>>>>>>>> | >>>>>> 0-HA-fast-150G-PVE1-client-0: >>>>>>>>> changing >>>>>>>>> | >>>>>> port to 49153 (from 0) >>>>>>>>> | >>>>>> [2014-08-05 08:30:37.830239] W >>>>>>>>> | >>>>>> [socket.c:514:__socket_rwv] >>>>>>>>> | >>>>>> 0-HA-fast-150G-PVE1-client-0: >>>>>>>>> readv >>>>>>>>> | >>>>>> failed (No data available) >>>>>>>>> | >>>>>> [2014-08-05 08:30:37.831024] I >>>>>>>>> | >>>>>> >>>>>>>>> [client-handshake.c:1659:select_server_supported_programs] >>>>>>>>> | >>>>>> 0-HA-fast-150G-PVE1-client-0: >>>>>>>>> Using >>>>>>>>> | >>>>>> Program GlusterFS 3.3, Num >>>>>>>>> (1298437), >>>>>>>>> | >>>>>> Version (330) >>>>>>>>> | >>>>>> [2014-08-05 08:30:37.831375] I >>>>>>>>> | >>>>>> >>>>>>>>> [client-handshake.c:1456:client_setvolume_cbk] >>>>>>>>> | >>>>>> 0-HA-fast-150G-PVE1-client-0: >>>>>>>>> Connected >>>>>>>>> | >>>>>> to 10.250.0.1:49153 >>>>>>>>> | >>>>>> <http://10.250.0.1:49153>, >>>>>>>>> attached to >>>>>>>>> | >>>>>> remote volume >>>>>>>>> '/exports/fast-test/150G'. >>>>>>>>> | >>>>>> [2014-08-05 08:30:37.831394] I >>>>>>>>> | >>>>>> >>>>>>>>> [client-handshake.c:1468:client_setvolume_cbk] >>>>>>>>> | >>>>>> 0-HA-fast-150G-PVE1-client-0: >>>>>>>>> Server and >>>>>>>>> | >>>>>> Client lk-version numbers are not >>>>>>>>> same, >>>>>>>>> | >>>>>> reopening the fds >>>>>>>>> | >>>>>> [2014-08-05 08:30:37.831566] I >>>>>>>>> | >>>>>> >>>>>>>>> [client-handshake.c:450:client_set_lk_version_cbk] >>>>>>>>> | >>>>>> 0-HA-fast-150G-PVE1-client-0: >>>>>>>>> Server lk >>>>>>>>> | >>>>>> version = 1 >>>>>>>>> | >>>>>> >>>>>>>>> | >>>>>> >>>>>>>>> | >>>>>> [2014-08-05 08:30:37.830150] I >>>>>>>>> | >>>>>> >>>>>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] >>>>>>>>> | >>>>>> 0-glusterfs: No change in >>>>>>>>> volfile, continuing >>>>>>>>> | >>>>>> this line seems weird to me tbh. >>>>>>>>> | >>>>>> I do not see any traffic on switch >>>>>>>>> | >>>>>> interfaces between gluster >>>>>>>>> servers, which >>>>>>>>> >>>>>>>> ... >>> >>> [Письмо показано не полностью] >> >> >> >> >> -- >> Best regards, >> Roman. >> > > > > -- > Best regards, > Roman. > -- Best regards, Roman.
_______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
