On 08/05/2014 03:07 PM, Roman wrote:
really, seems like the same file
stor1:
a951641c5230472929836f9fcede6b04
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
stor2:
a951641c5230472929836f9fcede6b04
/exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
one thing I've seen from logs, that somehow proxmox VE is connecting
with wrong version to servers?
[2014-08-05 09:23:45.218550] I
[client-handshake.c:1659:select_server_supported_programs]
0-HA-fast-150G-PVE1-client-0: Using Program GlusterFS 3.3, Num
(1298437), Version (330)
It is the rpc (over the network data structures) version, which is not
changed at all from 3.3 so thats not a problem. So what is the
conclusion? Is your test case working now or not?
Pranith
but if I issue:
root@pve1:~# glusterfs -V
glusterfs 3.4.4 built on Jun 28 2014 03:44:57
seems ok.
server use 3.4.4 meanwhile
[2014-08-05 09:23:45.117875] I
[server-handshake.c:567:server_setvolume] 0-HA-fast-150G-PVE1-server:
accepted client from
stor1-9004-2014/08/05-09:23:45:93538-HA-fast-150G-PVE1-client-1-0
(version: 3.4.4)
[2014-08-05 09:23:49.103035] I
[server-handshake.c:567:server_setvolume] 0-HA-fast-150G-PVE1-server:
accepted client from
stor1-8998-2014/08/05-09:23:45:89883-HA-fast-150G-PVE1-client-0-0
(version: 3.4.4)
if this could be the reason, of course.
I did restart the Proxmox VE yesterday (just for an information)
2014-08-05 12:30 GMT+03:00 Pranith Kumar Karampuri
<[email protected] <mailto:[email protected]>>:
On 08/05/2014 02:33 PM, Roman wrote:
Waited long enough for now, still different sizes and no logs
about healing :(
stor1
# file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
root@stor1:~# du -sh /exports/fast-test/150G/images/127/
1.2G /exports/fast-test/150G/images/127/
stor2
# file: exports/fast-test/150G/images/127/vm-127-disk-1.qcow2
trusted.afr.HA-fast-150G-PVE1-client-0=0x000000000000000000000000
trusted.afr.HA-fast-150G-PVE1-client-1=0x000000000000000000000000
trusted.gfid=0xf10ad81b58484bcd9b385a36a207f921
root@stor2:~# du -sh /exports/fast-test/150G/images/127/
1.4G /exports/fast-test/150G/images/127/
According to the changelogs, the file doesn't need any healing.
Could you stop the operations on the VMs and take md5sum on both
these machines?
Pranith
2014-08-05 11:49 GMT+03:00 Pranith Kumar Karampuri
<[email protected] <mailto:[email protected]>>:
On 08/05/2014 02:06 PM, Roman wrote:
Well, it seems like it doesn't see the changes were made to
the volume ? I created two files 200 and 100 MB (from
/dev/zero) after I disconnected the first brick. Then
connected it back and got these logs:
[2014-08-05 08:30:37.830150] I
[glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] 0-glusterfs: No
change in volfile, continuing
[2014-08-05 08:30:37.830207] I
[rpc-clnt.c:1676:rpc_clnt_reconfig]
0-HA-fast-150G-PVE1-client-0: changing port to 49153 (from 0)
[2014-08-05 08:30:37.830239] W [socket.c:514:__socket_rwv]
0-HA-fast-150G-PVE1-client-0: readv failed (No data available)
[2014-08-05 08:30:37.831024] I
[client-handshake.c:1659:select_server_supported_programs]
0-HA-fast-150G-PVE1-client-0: Using Program GlusterFS 3.3,
Num (1298437), Version (330)
[2014-08-05 08:30:37.831375] I
[client-handshake.c:1456:client_setvolume_cbk]
0-HA-fast-150G-PVE1-client-0: Connected to 10.250.0.1:49153
<http://10.250.0.1:49153>, attached to remote volume
'/exports/fast-test/150G'.
[2014-08-05 08:30:37.831394] I
[client-handshake.c:1468:client_setvolume_cbk]
0-HA-fast-150G-PVE1-client-0: Server and Client lk-version
numbers are not same, reopening the fds
[2014-08-05 08:30:37.831566] I
[client-handshake.c:450:client_set_lk_version_cbk]
0-HA-fast-150G-PVE1-client-0: Server lk version = 1
[2014-08-05 08:30:37.830150] I
[glusterfsd-mgmt.c:1584:mgmt_getspec_cbk] 0-glusterfs: No
change in volfile, continuing
this line seems weird to me tbh.
I do not see any traffic on switch interfaces between
gluster servers, which means, there is no syncing between them.
I tried to ls -l the files on the client and servers to
trigger the healing, but seems like no success. Should I
wait more?
Yes, it should take around 10-15 minutes. Could you provide
'getfattr -d -m. -e hex <file-on-brick>' on both the bricks.
Pranith
2014-08-05 11:25 GMT+03:00 Pranith Kumar Karampuri
<[email protected] <mailto:[email protected]>>:
On 08/05/2014 01:10 PM, Roman wrote:
Ahha! For some reason I was not able to start the VM
anymore, Proxmox VE told me, that it is not able to
read the qcow2 header due to permission is denied for
some reason. So I just deleted that file and created a
new VM. And the nex message I've got was this:
Seems like these are the messages where you took down
the bricks before self-heal. Could you restart the run
waiting for self-heals to complete before taking down
the next brick?
Pranith
[2014-08-05 07:31:25.663412] E
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log]
0-HA-fast-150G-PVE1-replicate-0: Unable to self-heal
contents of '/images/124/vm-124-disk-1.qcow2' (possible
split-brain). Please delete the file from all but the
preferred subvolume.- Pending matrix: [ [ 0 60 ] [ 11
0 ] ]
[2014-08-05 07:31:25.663955] E
[afr-self-heal-common.c:2262:afr_self_heal_completion_cbk]
0-HA-fast-150G-PVE1-replicate-0: background data
self-heal failed on /images/124/vm-124-disk-1.qcow2
2014-08-05 10:13 GMT+03:00 Pranith Kumar Karampuri
<[email protected] <mailto:[email protected]>>:
I just responded to your earlier mail about how the
log looks. The log comes on the mount's logfile
Pranith
On 08/05/2014 12:41 PM, Roman wrote:
Ok, so I've waited enough, I think. Had no any
traffic on switch ports between servers. Could not
find any suitable log message about completed
self-heal (waited about 30 minutes). Plugged out
the other server's UTP cable this time and got in
the same situation:
root@gluster-test1:~# cat /var/log/dmesg
-bash: /bin/cat: Input/output error
brick logs:
[2014-08-05 07:09:03.005474] I
[server.c:762:server_rpc_notify]
0-HA-fast-150G-PVE1-server: disconnecting
connectionfrom
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
[2014-08-05 07:09:03.005530] I
[server-helpers.c:729:server_connection_put]
0-HA-fast-150G-PVE1-server: Shutting down
connection
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
[2014-08-05 07:09:03.005560] I
[server-helpers.c:463:do_fd_cleanup]
0-HA-fast-150G-PVE1-server: fd cleanup on
/images/124/vm-124-disk-1.qcow2
[2014-08-05 07:09:03.005797] I
[server-helpers.c:617:server_connection_destroy]
0-HA-fast-150G-PVE1-server: destroyed connection
of
pve1-27649-2014/08/04-13:27:54:720789-HA-fast-150G-PVE1-client-0-0
2014-08-05 9:53 GMT+03:00 Pranith Kumar Karampuri
<[email protected] <mailto:[email protected]>>:
Do you think it is possible for you to do
these tests on the latest version 3.5.2?
'gluster volume heal <volname> info' would
give you that information in versions > 3.5.1.
Otherwise you will have to check it from
either the logs, there will be self-heal
completed message on the mount logs (or) by
observing 'getfattr -d -m. -e hex
<image-file-on-bricks>'
Pranith
On 08/05/2014 12:09 PM, Roman wrote:
Ok, I understand. I will try this shortly.
How can I be sure, that healing process is
done, if I am not able to see its status?
2014-08-05 9:30 GMT+03:00 Pranith Kumar
Karampuri <[email protected]
<mailto:[email protected]>>:
Mounts will do the healing, not the
self-heal-daemon. The problem I feel is
that whichever process does the healing
has the latest information about the good
bricks in this usecase. Since for VM
usecase, mounts should have the latest
information, we should let the mounts do
the healing. If the mount accesses the VM
image either by someone doing operations
inside the VM or explicit stat on the
file it should do the healing.
Pranith.
On 08/05/2014 10:39 AM, Roman wrote:
Hmmm, you told me to turn it off. Did I
understood something wrong? After I
issued the command you've sent me, I was
not able to watch the healing process,
it said, it won't be healed, becouse its
turned off.
2014-08-05 5:39 GMT+03:00 Pranith Kumar
Karampuri <[email protected]
<mailto:[email protected]>>:
You didn't mention anything about
self-healing. Did you wait until the
self-heal is complete?
Pranith
On 08/04/2014 05:49 PM, Roman wrote:
Hi!
Result is pretty same. I set the
switch port down for 1st server, it
was ok. Then set it up back and set
other server's port off. and it
triggered IO error on two virtual
machines: one with local root FS
but network mounted storage. and
other with network root FS. 1st
gave an error on copying to or from
the mounted network disk, other
just gave me an error for even
reading log.files.
cat: /var/log/alternatives.log:
Input/output error
then I reset the kvm VM and it said
me, there is no boot device. Next I
virtually powered it off and then
back on and it has booted.
By the way, did I have to
start/stop volume?
>> Could you do the following and
test it again?
>> gluster volume set <volname>
cluster.self-heal-daemon off
>>Pranith
2014-08-04 14:10 GMT+03:00 Pranith
Kumar Karampuri
<[email protected]
<mailto:[email protected]>>:
On 08/04/2014 03:33 PM, Roman
wrote:
Hello!
Facing the same problem as
mentioned here:
http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html
my set up is up and running,
so i'm ready to help you back
with feedback.
setup:
proxmox server as client
2 gluster physical servers
server side and client side
both running atm 3.4.4
glusterfs from gluster repo.
the problem is:
1. craeted replica bricks.
2. mounted in proxmox (tried
both promox ways: via GUI and
fstab (with backup volume
line), btw while mounting via
fstab I'm unable to launch a
VM without cache, meanwhile
direct-io-mode is enabled in
fstab line)
3. installed VM
4. bring one volume down - ok
5. bringing up, waiting for
sync is done.
6. bring other volume down -
getting IO errors on VM guest
and not able to restore the VM
after I reset the VM via host.
It says (no bootable media).
After I shut it down (forced)
and bring back up, it boots.
Could you do the following and
test it again?
gluster volume set <volname>
cluster.self-heal-daemon off
Pranith
Need help. Tried 3.4.3, 3.4.4.
Still missing pkg-s for 3.4.5
for debian and 3.5.2 (3.5.1
always gives a healing error
for some reason)
--
Best regards,
Roman.
_______________________________________________
Gluster-users mailing list
[email protected]
<mailto:[email protected]>
http://supercolony.gluster.org/mailman/listinfo/gluster-users
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
--
Best regards,
Roman.
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users