Re: [Gluster-users] libgfapi failover problem on replica bricks

Roman Wed, 06 Aug 2014 05:17:43 -0700

Oh, there are still 3.5.1 in testing repository. So I'll wait for gluster
devs to publish 3.5.2



2014-08-06 10:20 GMT+03:00 Pranith Kumar Karampuri <[email protected]>:

>
> On 08/06/2014 12:27 PM, Roman wrote:
>
> Yesterday I've reproduced this situation two times.
>
> The setup:
> 1. Hardware and network
>    a. Disks INTEL SSDSC2BB240G4
>    b1. Network cards: X540-AT2
>     b2. Netgear 10g switch
> 2. Software setup:
>    a. OS: Debian wheezy
>    b. Glusterfs: 3.4.4 (latest 3.4.4 from gluster repository)
>    c. Promox VE with update glusterfs from gluster repository
> 3. Software Configuration
>    a. create replicated volume with cluster.self-heal-daemon: off;
> nfs.disable: off; network.ping-timeout: 2 opts
>    b. mount it on proxmox VE (via proxmox gui, it mouts with these
> opts: stor1:HA-fast-150G-PVE1 on /mnt/pve/FAST-TESt type fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>   )
>    c. install VM with qcow2 or raw disk image.
>    d. disable port / remove network cable from one of storage servers
>    e. wait and put cable back
>    f. keep waiting for sync (pointless, it won't ever start)
>    g. disable another port for second server (or remove cable from second
> server)
>    h. profit.
>
>  Maybe I could use 3.5.2 from debian sid (testing) repository to test
> with?
>
> Sure, you can go ahead. I will just write one document about maintaining
> VMs on gluster from the perspective of replication.
>
> Pranith
>
>
>
> 2014-08-06 9:39 GMT+03:00 Pranith Kumar Karampuri <[email protected]>:
>
>>  Roman,
>>     The file went into split-brain. I think we should do these tests with
>> 3.5.2. Where monitoring the heals is easier. Let me also come up with a
>> document about how to do this testing you are trying to do.
>>
>> Humble/Niels,
>>     Do we have debs available for 3.5.2? In 3.5.1 there was packaging
>> issue where /usr/bin/glfsheal is not packaged along with the deb. I think
>> that should be fixed now as well?
>>
>> Pranith
>>
>> On 08/06/2014 11:52 AM, Roman wrote:
>>
>> good morning,
>>
>>  root@stor1:~# getfattr -d -m. -e hex
>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>> getfattr: Removing leading '/' from absolute path names
>> # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000001320000000000000000
>> trusted.gfid=0x23c79523075a4158bea38078da570449
>>
>>  getfattr: Removing leading '/' from absolute path names
>> # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000040000000000000000
>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>> trusted.gfid=0x23c79523075a4158bea38078da570449
>>
>>
>>
>> 2014-08-06 9:20 GMT+03:00 Pranith Kumar Karampuri <[email protected]>:
>>
>>>
>>> On 08/06/2014 11:30 AM, Roman wrote:
>>>
>>> Also, this time files are not the same!
>>>
>>>  root@stor1:~# md5sum
>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>> 32411360c53116b96a059f17306caeda
>>>  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>
>>>  root@stor2:~# md5sum
>>> /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>> 65b8a6031bcb6f5fb3a11cb1e8b1c9c9
>>>  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>
>>>  What is the getfattr output?
>>>
>>> Pranith
>>>
>>>
>>>
>>> 2014-08-05 16:33 GMT+03:00 Roman <[email protected]>:
>>>
>>>> Nope, it is not working. But this time it went a bit other way
>>>>
>>>>  root@gluster-client:~# dmesg
>>>> Segmentation fault
>>>>
>>>>
>>>>  I was not able even to start the VM after I done the tests
>>>>
>>>>  Could not read qcow2 header: Operation not permitted
>>>>
>>>>  And it seems, it never starts to sync files after first disconnect.
>>>> VM survives first disconnect, but not second (I waited around 30 minutes).
>>>> Also, I've got network.ping-timeout: 2 in volume settings, but logs react
>>>> on first disconnect around 30 seconds. Second was faster, 2 seconds.
>>>>
>>>>  Reaction was different also:
>>>>
>>>>  slower one:
>>>>  [2014-08-05 13:26:19.558435] W [socket.c:514:__socket_rwv]
>>>> 0-glusterfs: readv failed (Connection timed out)
>>>> [2014-08-05 13:26:19.558485] W
>>>> [socket.c:1962:__socket_proto_state_machine] 0-glusterfs: reading from
>>>> socket failed. Error (Connection timed out), peer (10.250.0.1:24007)
>>>> [2014-08-05 13:26:21.281426] W [socket.c:514:__socket_rwv]
>>>> 0-HA-fast-150G-PVE1-client-0: readv failed (Connection timed out)
>>>> [2014-08-05 13:26:21.281474] W
>>>> [socket.c:1962:__socket_proto_state_machine] 0-HA-fast-150G-PVE1-client-0:
>>>> reading from socket failed. Error (Connection timed out), peer (
>>>> 10.250.0.1:49153)
>>>> [2014-08-05 13:26:21.281507] I [client.c:2098:client_rpc_notify]
>>>> 0-HA-fast-150G-PVE1-client-0: disconnected
>>>>
>>>>  the fast one:
>>>>  2014-08-05 12:52:44.607389] C
>>>> [client-handshake.c:127:rpc_client_ping_timer_expired]
>>>> 0-HA-fast-150G-PVE1-client-1: server 10.250.0.2:49153 has not
>>>> responded in the last 2 seconds, disconnecting.
>>>> [2014-08-05 12:52:44.607491] W [socket.c:514:__socket_rwv]
>>>> 0-HA-fast-150G-PVE1-client-1: readv failed (No data available)
>>>> [2014-08-05 12:52:44.607585] E [rpc-clnt.c:368:saved_frames_unwind]
>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>> [0x7fcb1b4b0558]
>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>> [0x7fcb1b4aea63]
>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>> [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced unwinding frame
>>>> type(GlusterFS 3.3) op(LOOKUP(27)) called at 2014-08-05 12:52:42.463881
>>>> (xid=0x381883x)
>>>> [2014-08-05 12:52:44.607604] W
>>>> [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-HA-fast-150G-PVE1-client-1:
>>>> remote operation failed: Transport endpoint is not connected. Path: /
>>>> (00000000-0000-0000-0000-000000000001)
>>>> [2014-08-05 12:52:44.607736] E [rpc-clnt.c:368:saved_frames_unwind]
>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xf8)
>>>> [0x7fcb1b4b0558]
>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xc3)
>>>> [0x7fcb1b4aea63]
>>>> (-->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)
>>>> [0x7fcb1b4ae97e]))) 0-HA-fast-150G-PVE1-client-1: forced unwinding frame
>>>> type(GlusterFS Handshake) op(PING(3)) called at 2014-08-05 12:52:42.463891
>>>> (xid=0x381884x)
>>>> [2014-08-05 12:52:44.607753] W [client-handshake.c:276:client_ping_cbk]
>>>> 0-HA-fast-150G-PVE1-client-1: timer must have expired
>>>> [2014-08-05 12:52:44.607776] I [client.c:2098:client_rpc_notify]
>>>> 0-HA-fast-150G-PVE1-client-1: disconnected
>>>>
>>>>
>>>>
>>>>  I've got SSD disks (just for an info).
>>>> Should I go and give a try for 3.5.2?
>>>>
>>>>
>>>>
>>>>  2014-08-05 13:06 GMT+03:00 Pranith Kumar Karampuri <
>>>> [email protected]>:
>>>>
>>>>  reply along with gluster-users please :-). May be you are hitting
>>>>> 'reply' instead of 'reply all'?
>>>>>
>>>>> Pranith
>>>>>
>>>>> On 08/05/2014 03:35 PM, Roman wrote:
>>>>>
>>>>> To make sure and clean, I've created another VM with raw format and
>>>>> goint to repeat those steps. So now I've got two VM-s one with qcow2 
>>>>> format
>>>>> and other with raw format. I will send another e-mail shortly.
>>>>>
>>>>>
>>>>> 2014-08-05 13:01 GMT+03:00 Pranith Kumar Karampuri <
>>>>> [email protected]>:
>>>>>
>>>>>>
>>>>>> On 08/05/2014 03:07 PM, Roman wrote:
>>>>>>
>>>>>> really, seems like the same file
>>>>>>
>>>>>>  stor1:
>>>>>> a951641c5230472929836f9fcede6b04
>>>>>>  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>
>>>>>>  stor2:
>>>>>> a951641c5230472929836f9fcede6b04
>>>>>>  /exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>
>>>>>>
>>>>>>  one thing I've seen from logs, that somehow proxmox VE is
>>>>>> connecting with wrong version to servers?
>>>>>> [2014-08-05 09:23:45.218550] I
>>>>>> [client-handshake.c:1659:select_server_supported_programs]
>>>>>> 0-HA-fast-150G-PVE1-client-0: Using Program GlusterFS 3.3, Num (1298437),
>>>>>> Version (330)
>>>>>>
>>>>>>  It is the rpc (over the network data structures) version, which is
>>>>>> not changed at all from 3.3 so thats not a problem. So what is the
>>>>>> conclusion? Is your test case working now or not?
>>>>>>
>>>>>> Pranith
>>>>>>
>>>>>>   but if I issue:
>>>>>>  root@pve1:~# glusterfs -V
>>>>>> glusterfs 3.4.4 built on Jun 28 2014 03:44:57
>>>>>>  seems ok.
>>>>>>
>>>>>>  server  use 3.4.4 meanwhile
>>>>>> [2014-08-05 09:23:45.117875] I
>>>>>> [server-handshake.c:567:server_setvolume] 0-HA-fast-150G-PVE1-server:
>>>>>> accepted client from
>>>>>> stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0 
>>>>>> (version:
>>>>>> 3.4.4)
>>>>>>  [2014-08-05 09:23:49.103035] I
>>>>>> [server-handshake.c:567:server_setvolume] 0-HA-fast-150G-PVE1-server:
>>>>>> accepted client from
>>>>>> stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0 
>>>>>> (version:
>>>>>> 3.4.4)
>>>>>>
>>>>>>  if this could be the reason, of course.
>>>>>> I did restart the Proxmox VE yesterday (just for an information)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-08-05 12:30 GMT+03:00 Pranith Kumar Karampuri <
>>>>>> [email protected]>:
>>>>>>
>>>>>>>
>>>>>>> On 08/05/2014 02:33 PM, Roman wrote:
>>>>>>>
>>>>>>> Waited long enough for now, still different sizes and no logs about
>>>>>>> healing :(
>>>>>>>
>>>>>>>  stor1
>>>>>>>  # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>>>
>>>>>>>  root@stor1:~# du -sh /exports/fast-test/150G/images/127/
>>>>>>> 1.2G    /exports/fast-test/150G/images/127/
>>>>>>>
>>>>>>>
>>>>>>>  stor2
>>>>>>>  # file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
>>>>>>> trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
>>>>>>> trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
>>>>>>>
>>>>>>>
>>>>>>>  root@stor2:~# du -sh /exports/fast-test/150G/images/127/
>>>>>>> 1.4G    /exports/fast-test/150G/images/127/
>>>>>>>
>>>>>>>  According to the changelogs, the file doesn't need any healing.
>>>>>>> Could you stop the operations on the VMs and take md5sum on both these
>>>>>>> machines?
>>>>>>>
>>>>>>> Pranith
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-08-05 11:49 GMT+03:00 Pranith Kumar Karampuri <
>>>>>>> [email protected]>:
>>>>>>>
>>>>>>>>
>>>>>>>> On 08/05/2014 02:06 PM, Roman wrote:
>>>>>>>>
>>>>>>>> Well, it seems like it doesn't see the changes were made to the
>>>>>>>> volume ? I created two files 200 and 100 MB (from /dev/zero) after I
>>>>>>>> disconnected the first brick. Then connected it back and got these 
>>>>>>>> logs:
>>>>>>>>
>>>>>>>>  [2014-08-05 08:30:37.830150] I
>>>>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] 0-glusterfs: No change in
>>>>>>>> volfile, continuing
>>>>>>>> [2014-08-05 08:30:37.830207] I [rpc-clnt.c:1676:rpc_clnt_reconfig]
>>>>>>>> 0-HA-fast-150G-PVE1-client-0: changing port to 49153 (from 0)
>>>>>>>> [2014-08-05 08:30:37.830239] W [socket.c:514:__socket_rwv]
>>>>>>>> 0-HA-fast-150G-PVE1-client-0: readv failed (No data available)
>>>>>>>> [2014-08-05 08:30:37.831024] I
>>>>>>>> [client-handshake.c:1659:select_server_supported_programs]
>>>>>>>> 0-HA-fast-150G-PVE1-client-0: Using Program GlusterFS 3.3, Num 
>>>>>>>> (1298437),
>>>>>>>> Version (330)
>>>>>>>> [2014-08-05 08:30:37.831375] I
>>>>>>>> [client-handshake.c:1456:client_setvolume_cbk]
>>>>>>>> 0-HA-fast-150G-PVE1-client-0: Connected to 10.250.0.1:49153,
>>>>>>>> attached to remote volume '/exports/fast-test/150G'.
>>>>>>>> [2014-08-05 08:30:37.831394] I
>>>>>>>> [client-handshake.c:1468:client_setvolume_cbk]
>>>>>>>> 0-HA-fast-150G-PVE1-client-0: Server and Client lk-version numbers are 
>>>>>>>> not
>>>>>>>> same, reopening the fds
>>>>>>>> [2014-08-05 08:30:37.831566] I
>>>>>>>> [client-handshake.c:450:client_set_lk_version_cbk]
>>>>>>>> 0-HA-fast-150G-PVE1-client-0: Server lk version = 1
>>>>>>>>
>>>>>>>>
>>>>>>>>  [2014-08-05 08:30:37.830150] I
>>>>>>>> [glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] 0-glusterfs: No change in
>>>>>>>> volfile, continuing
>>>>>>>>  this line seems weird to me tbh.
>>>>>>>> I do not see any traffic on switch interfaces between gluster
>>>>>>>> servers, which means, there is no syncing between them.
>>>>>>>> I tried to ls -l the files on the client and servers to trigger the
>>>>>>>> healing, but seems like no success. Should I wait more?
>>>>>>>>
>>>>>>>>  Yes, it should take around 10-15 minutes. Could you provide
>>>>>>>> 'getfattr -d -m. -e hex <file-on-brick>' on both the bricks.
>>>>>>>>
>>>>>>>> Pranith
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-08-05 11:25 GMT+03:00 Pranith Kumar Karampuri <
>>>>>>>> [email protected]>:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 08/05/2014 01:10 PM, Roman wrote:
>>>>>>>>>
>>>>>>>>> Ahha! For some reason I was not able to start the VM anymore,
>>>>>>>>> Proxmox VE told me, that it is not able to read the qcow2 header due 
>>>>>>>>> to
>>>>>>>>> permission is denied for some reason. So I just deleted that file and
>>>>>>>>> created a new VM. And the nex message I've got was this:
>>>>>>>>>
>>>>>>>>>  Seems like these are the messages where you took down the bricks
>>>>>>>>> before self-heal. Could you restart the run waiting for self-heals to
>>>>>>>>> complete before taking down the next brick?
>>>>>>>>>
>>>>>>>>> Pranith
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  [2014-08-05 07:31:25.663412] E
>>>>>>>>> [afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
>>>>>>>>> 0-HA-fast-150G-PVE1-replicate-0: Unable to self-heal contents of
>>>>>>>>> '/images/124/vm-124-disk-1.qcow2' (possible split-brain). Please 
>>>>>>>>> delete the
>>>>>>>>> file from all but the preferred subvolume.- Pending matrix:  [ [ 0 60 
>>>>>>>>> ] [
>>>>>>>>> 11 0 ] ]
>>>>>>>>> [2014-08-05 07:31:25.663955] E
>>>>>>>>> [afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
>>>>>>>>> 0-HA-fast-150G-PVE1-replicate-0: background  data self-heal failed on
>>>>>>>>> /images/124/vm-124-disk-1.qcow2
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-08-05 10:13 GMT+03:00 Pranith Kumar Karampuri <
>>>>>>>>> [email protected]>:
>>>>>>>>>
>>>>>>>>>>  I just responded to your earlier mail about how the log looks.
>>>>>>>>>> The log comes on the mount's logfile
>>>>>>>>>>
>>>>>>>>>> Pranith
>>>>>>>>>>
>>>>>>>>>> On 08/05/2014 12:41 PM, Roman wrote:
>>>>>>>>>>
>>>>>>>>>> Ok, so I've waited enough, I think. Had no any traffic on switch
>>>>>>>>>> ports between servers. Could not find any suitable log message about
>>>>>>>>>> completed self-heal (waited about 30 minutes). Plugged out the other
>>>>>>>>>> server's UTP cable this time and got in the same situation:
>>>>>>>>>> root@gluster-test1:~# cat /var/log/dmesg
>>>>>>>>>> -bash: /bin/cat: Input/output error
>>>>>>>>>>
>>>>>>>>>>  brick logs:
>>>>>>>>>>  [2014-08-05 07:09:03.005474] I [server.c:762:server_rpc_notify]
>>>>>>>>>> 0-HA-fast-150G-PVE1-server: disconnecting connectionfrom
>>>>>>>>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>>>>>>>> [2014-08-05 07:09:03.005530] I
>>>>>>>>>> [server-helpers.c:729:server_connection_put] 
>>>>>>>>>> 0-HA-fast-150G-PVE1-server:
>>>>>>>>>> Shutting down connection
>>>>>>>>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>>>>>>>> [2014-08-05 07:09:03.005560] I
>>>>>>>>>> [server-helpers.c:463:do_fd_cleanup] 0-HA-fast-150G-PVE1-server: fd 
>>>>>>>>>> cleanup
>>>>>>>>>> on /images/124/vm-124-disk-1.qcow2
>>>>>>>>>> [2014-08-05 07:09:03.005797] I
>>>>>>>>>> [server-helpers.c:617:server_connection_destroy]
>>>>>>>>>> 0-HA-fast-150G-PVE1-server: destroyed connection of
>>>>>>>>>> pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2014-08-05 9:53 GMT+03:00 Pranith Kumar Karampuri <
>>>>>>>>>> [email protected]>:
>>>>>>>>>>
>>>>>>>>>>>  Do you think it is possible for you to do these tests on the
>>>>>>>>>>> latest version 3.5.2? 'gluster volume heal <volname> info' would 
>>>>>>>>>>> give you
>>>>>>>>>>> that information in versions > 3.5.1.
>>>>>>>>>>> Otherwise you will have to check it from either the logs, there
>>>>>>>>>>> will be self-heal completed message on the mount logs (or) by 
>>>>>>>>>>> observing
>>>>>>>>>>> 'getfattr -d -m. -e hex <image-file-on-bricks>'
>>>>>>>>>>>
>>>>>>>>>>> Pranith
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 08/05/2014 12:09 PM, Roman wrote:
>>>>>>>>>>>
>>>>>>>>>>> Ok, I understand. I will try this shortly.
>>>>>>>>>>> How can I be sure, that healing process is done, if I am not
>>>>>>>>>>> able to see its status?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2014-08-05 9:30 GMT+03:00 Pranith Kumar Karampuri <
>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>
>>>>>>>>>>>>  Mounts will do the healing, not the self-heal-daemon. The
>>>>>>>>>>>> problem I feel is that whichever process does the healing has the 
>>>>>>>>>>>> latest
>>>>>>>>>>>> information about the good bricks in this usecase. Since for VM 
>>>>>>>>>>>> usecase,
>>>>>>>>>>>> mounts should have the latest information, we should let the 
>>>>>>>>>>>> mounts do the
>>>>>>>>>>>> healing. If the mount accesses the VM image either by someone doing
>>>>>>>>>>>> operations inside the VM or explicit stat on the file it should do 
>>>>>>>>>>>> the
>>>>>>>>>>>> healing.
>>>>>>>>>>>>
>>>>>>>>>>>> Pranith.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 08/05/2014 10:39 AM, Roman wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hmmm, you told me to turn it off. Did I understood something
>>>>>>>>>>>> wrong? After I issued the command you've sent me, I was not able 
>>>>>>>>>>>> to watch
>>>>>>>>>>>> the healing process, it said, it won't be healed, becouse its 
>>>>>>>>>>>> turned off.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 2014-08-05 5:39 GMT+03:00 Pranith Kumar Karampuri <
>>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>>
>>>>>>>>>>>>>  You didn't mention anything about self-healing. Did you wait
>>>>>>>>>>>>> until the self-heal is complete?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Pranith
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 08/04/2014 05:49 PM, Roman wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>  Hi!
>>>>>>>>>>>>> Result is pretty same. I set the switch port down for 1st
>>>>>>>>>>>>> server, it was ok. Then set it up back and set other server's 
>>>>>>>>>>>>> port off. and
>>>>>>>>>>>>> it triggered IO error on two virtual machines: one with local 
>>>>>>>>>>>>> root FS but
>>>>>>>>>>>>> network mounted storage. and other with network root FS. 1st gave 
>>>>>>>>>>>>> an error
>>>>>>>>>>>>> on copying to or from the mounted network disk, other just gave 
>>>>>>>>>>>>> me an error
>>>>>>>>>>>>> for even reading log.files.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  cat: /var/log/alternatives.log: Input/output error
>>>>>>>>>>>>>  then I reset the kvm VM and it said me, there is no boot
>>>>>>>>>>>>> device. Next I virtually powered it off and then back on and it 
>>>>>>>>>>>>> has booted.
>>>>>>>>>>>>>
>>>>>>>>>>>>>  By the way, did I have to start/stop volume?
>>>>>>>>>>>>>
>>>>>>>>>>>>>  >> Could you do the following and test it again?
>>>>>>>>>>>>> >> gluster volume set <volname> cluster.self-heal-daemon off
>>>>>>>>>>>>>
>>>>>>>>>>>>> >>Pranith
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-08-04 14:10 GMT+03:00 Pranith Kumar Karampuri <
>>>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 08/04/2014 03:33 PM, Roman wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Hello!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Facing the same problem as mentioned here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  my set up is up and running, so i'm ready to help you back
>>>>>>>>>>>>>> with feedback.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  setup:
>>>>>>>>>>>>>> proxmox server as client
>>>>>>>>>>>>>>  2 gluster physical  servers
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  server side and client side both running atm 3.4.4
>>>>>>>>>>>>>> glusterfs from gluster repo.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  the problem is:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  1. craeted replica bricks.
>>>>>>>>>>>>>> 2. mounted in proxmox (tried both promox ways: via GUI and
>>>>>>>>>>>>>> fstab (with backup volume line), btw while mounting via fstab 
>>>>>>>>>>>>>> I'm unable to
>>>>>>>>>>>>>> launch a VM without cache, meanwhile direct-io-mode is enabled 
>>>>>>>>>>>>>> in fstab
>>>>>>>>>>>>>> line)
>>>>>>>>>>>>>> 3. installed VM
>>>>>>>>>>>>>> 4. bring one volume down - ok
>>>>>>>>>>>>>>  5. bringing up, waiting for sync is done.
>>>>>>>>>>>>>> 6. bring other volume down - getting IO errors on VM guest
>>>>>>>>>>>>>> and not able to restore the VM after I reset the VM via host. It 
>>>>>>>>>>>>>> says (no
>>>>>>>>>>>>>> bootable media). After I shut it down (forced) and bring back 
>>>>>>>>>>>>>> up, it boots.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Could you do the following and test it again?
>>>>>>>>>>>>>> gluster volume set <volname> cluster.self-heal-daemon off
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Pranith
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Need help. Tried 3.4.3, 3.4.4.
>>>>>>>>>>>>>> Still missing pkg-s for 3.4.5 for debian and 3.5.2 (3.5.1
>>>>>>>>>>>>>> always gives a healing error for some reason)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  --
>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>> Roman.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  _______________________________________________
>>>>>>>>>>>>>> Gluster-users mailing 
>>>>>>>>>>>>>> [email protected]http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  --
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>> Roman.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  --
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Roman.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  --
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Roman.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  --
>>>>>>>>>> Best regards,
>>>>>>>>>> Roman.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>> Best regards,
>>>>>>>>> Roman.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> Best regards,
>>>>>>>> Roman.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> Best regards,
>>>>>>> Roman.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>> Best regards,
>>>>>> Roman.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>> Best regards,
>>>>> Roman.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>  --
>>>> Best regards,
>>>> Roman.
>>>>
>>>
>>>
>>>
>>>  --
>>> Best regards,
>>> Roman.
>>>
>>>
>>>
>>
>>
>>  --
>> Best regards,
>> Roman.
>>
>>
>>
>
>
>  --
> Best regards,
> Roman.
>
>
>


-- 
Best regards,
Roman.

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] libgfapi failover problem on replica bricks

Reply via email to