On 2/5/12 4:12 PM, Ove K. Pettersen wrote:
Hi...

Started playing with gluster. And the heal functions is my "target" for testing.

Short description of my test
----------------------------
* 4 replicas on single machine
* glusterfs mounted locally
* Create file on glusterfs-mounted directory: date >data.txt
Usually it is bad to write directly to the brick.

* Append to file on one of the bricks: hostname >>data.txt
* Trigger a self-heal with: stat data.txt
=> Modified file on the brick is still different from the three other bricks.
What a surprise :-)

Q1: Is the modified file supposed to get corrected in my case?

Q2: Is my test-case invalid for what gluster is supposed to handle?
Probably, since the file has no gfid attributes, gluster has no clue what to do with it. If you delete it, it should self-heal.

I would be very happy to receive an opinion on this.

All the details are put at the bottom


Regards,

Ove






Details and logs
----------------
My environment
* single machine with OpenSuse 12.1
* gluster 3.3beta2
* xfs is used for the brick-fs (each one to a separate disk)

Volume Name: d1
Type: Replicate
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: glusterdev:/b1
Brick2: glusterdev:/b2
Brick3: glusterdev:/b3
Brick4: glusterdev:/b4
Options Reconfigured:
diagnostics.brick-log-level: DEBUG
diagnostics.client-log-level: DEBUG

* My glustervolume is mountet as glusterfs on /d1
glusterdev:/d1           52403200   33024  52370176   1% /d1



1) I put a file on the glusterfs
date >/d1/data.txt

2) Checking the storage on the bricks: ls -l /b1 /b2 /b3 /b4
/b1:
total 8
-rw-r--r-- 1 root root 29 Feb  5 21:40 data.txt

/b2:
total 8
-rw-r--r-- 1 root root 29 Feb  5 21:40 data.txt

/b3:
total 8
-rw-r--r-- 1 root root 29 Feb  5 21:40 data.txt

/b4:
total 8
-rw-r--r-- 1 root root 29 Feb  5 21:40 data.txt


Everything is OK.

3) Then append a line on to /b1/data.txt
hostname >>/b1/data.txt

By just appending, I keep the inode and all the extended attributes (checked with attr -l on the /b1/data.txt file).


4) Check again: ls -l /b1 /b2 /b3 /b4
/b1:
total 8
-rw-r--r-- 1 root root 40 Feb  5 21:40 data.txt

/b2:
total 8
-rw-r--r-- 1 root root 29 Feb  5 21:40 data.txt

/b3:
total 8
-rw-r--r-- 1 root root 29 Feb  5 21:40 data.txt

/b4:
total 8
-rw-r--r-- 1 root root 29 Feb  5 21:40 data.txt

Now we have a mismatch between the bricks. Both in contents and also the size of the file.

4) Trigger a self-heal by using stat
cd /d1 ; stat data.txt

  File: `data.txt'
  Size: 29              Blocks: 16         IO Block: 131072 regular file
Device: 1fh/31d Inode: 18446744070822815228  Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-02-05 21:40:32.382157697 +0100
Modify: 2012-02-05 21:40:32.388157658 +0100
Change: 2012-02-05 21:40:32.389157657 +0100
 Birth: -


5) stat gives the correct size of 29 which was the size of the file stored on the glusterfs.
6) DEBUG-log from the client when running this stat-command below.

7) Check again the bricks: ls -l /b1 /b2 /b3 /b4
/b1:
total 8
-rw-r--r-- 1 root root 40 Feb  5 21:40 data.txt

/b2:
total 8
-rw-r--r-- 1 root root 29 Feb  5 21:40 data.txt

/b3:
total 8
-rw-r--r-- 1 root root 29 Feb  5 21:40 data.txt

/b4:
total 8
-rw-r--r-- 1 root root 29 Feb  5 21:40 data.txt


8) Brick /b1 still has a bad copy of the file.



In the logs it looks like the difference in size is detected.
It also starts some self-healing actions.
But then for some reason finds out no healing is needed.
Marked a few lines with <<<<< ???? which looks suspect.

===============CUT Client Log
[2012-02-05 21:41:01.206102] D [client-handshake.c:179:client_start_ping] 0-d1-client-0: returning as transport is already disconnected OR there are no frames (0 || 0) [2012-02-05 21:41:01.206201] D [client-handshake.c:179:client_start_ping] 0-d1-client-1: returning as transport is already disconnected OR there are no frames (0 || 0) [2012-02-05 21:41:01.206224] D [client-handshake.c:179:client_start_ping] 0-d1-client-2: returning as transport is already disconnected OR there are no frames (0 || 0) [2012-02-05 21:41:01.206241] D [client-handshake.c:179:client_start_ping] 0-d1-client-3: returning as transport is already disconnected OR there are no frames (0 || 0) [2012-02-05 21:41:03.279124] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.279182] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.279202] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.279218] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.279232] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 0 character innocent [2012-02-05 21:41:03.279251] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 1 character innocent [2012-02-05 21:41:03.279266] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 2 character innocent [2012-02-05 21:41:03.279280] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 3 character innocent [2012-02-05 21:41:03.279294] D [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: Number of sources: 0 [2012-02-05 21:41:03.279309] D [afr-self-heal-data.c:836:afr_lookup_select_read_child_by_txn_type] 0-d1-replicate-0: returning read_child: 1 [2012-02-05 21:41:03.279324] D [afr-common.c:1183:afr_lookup_select_read_child] 0-d1-replicate-0: Source selected as 1 for / [2012-02-05 21:41:03.279341] D [afr-common.c:990:afr_lookup_build_response_params] 0-d1-replicate-0: Building lookup response from 1 [2012-02-05 21:41:03.279561] D [afr-common.c:1945:afr_lookup] 0-d1-replicate-0: failed to get the gfid from dict [2012-02-05 21:41:03.280265] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.280299] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.280317] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.280332] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.280346] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 0 character innocent [2012-02-05 21:41:03.280377] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 1 character innocent [2012-02-05 21:41:03.280392] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 2 character innocent [2012-02-05 21:41:03.280405] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 3 character innocent [2012-02-05 21:41:03.280419] D [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: Number of sources: 0 [2012-02-05 21:41:03.280433] D [afr-self-heal-data.c:836:afr_lookup_select_read_child_by_txn_type] 0-d1-replicate-0: returning read_child: 2 [2012-02-05 21:41:03.280507] D [afr-common.c:1183:afr_lookup_select_read_child] 0-d1-replicate-0: Source selected as 2 for /data.txt [2012-02-05 21:41:03.280530] D [afr-common.c:990:afr_lookup_build_response_params] 0-d1-replicate-0: Building lookup response from 2 [2012-02-05 21:41:03.280548] I [afr-common.c:1076:afr_detect_self_heal_by_iatt] 0-d1-replicate-0: size differs for /data.txt [2012-02-05 21:41:03.280576] I [afr-common.c:1225:afr_launch_self_heal] 0-d1-replicate-0: background data self-heal triggered. path: /data.txt [2012-02-05 21:41:03.280612] D [afr-self-heal-metadata.c:78:afr_sh_metadata_done] 0-d1-replicate-0: proceeding to data check on /data.txt [2012-02-05 21:41:03.281307] D [afr-lk-common.c:1397:afr_nonblocking_inodelk] 0-d1-replicate-0: attempting data lock range 0 0 by 140191242417704 [2012-02-05 21:41:03.281734] D [afr-self-heal-data.c:1136:afr_sh_data_post_nonblocking_inodelk_cbk] 0-d1-replicate-0: Non Blocking data inodelks done for /data.txt by 140191242417704. Proceeding to self-heal [2012-02-05 21:41:03.282815] D [afr-self-heal-data.c:650:afr_sh_data_fix] 0-d1-replicate-0: Pending matrix for: 140191242417704 [2012-02-05 21:41:03.282896] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.282940] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.282978] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.283015] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.283050] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 0 character innocent [2012-02-05 21:41:03.283095] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 1 character innocent [2012-02-05 21:41:03.283145] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 2 character innocent [2012-02-05 21:41:03.283191] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 3 character innocent [2012-02-05 21:41:03.283229] D [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: Number of sources: 0 <<<<<< ????? [2012-02-05 21:41:03.283264] D [afr-self-heal-data.c:657:afr_sh_data_fix] 0-d1-replicate-0: No self-heal needed for /data.txt <<<<<< ????? [2012-02-05 21:41:03.283306] D [afr-self-heal-data.c:320:afr_sh_data_finish] 0-d1-replicate-0: finishing data selfheal of /data.txt [2012-02-05 21:41:03.283344] D [afr-lk-common.c:410:transaction_lk_op] 0-d1-replicate-0: lk op is for a self heal [2012-02-05 21:41:03.283379] D [afr-lk-common.c:597:afr_unlock_inodelk] 0-d1-replicate-0: attempting data unlock range 0 0 by 140191242417704 [2012-02-05 21:41:03.283879] I [afr-self-heal-common.c:2022:afr_self_heal_completion_cbk] 0-d1-replicate-0: background data self-heal completed on /data.txt [2012-02-05 21:41:03.283960] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.284004] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.284042] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.284098] D [afr-self-heal-common.c:139:afr_sh_print_pending_matrix] 0-d1-replicate-0: pending_matrix: [ 0 0 0 0 ] [2012-02-05 21:41:03.284136] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 0 character innocent [2012-02-05 21:41:03.284173] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 1 character innocent [2012-02-05 21:41:03.284216] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 2 character innocent [2012-02-05 21:41:03.284253] D [afr-self-heal-common.c:604:afr_find_child_character_type] 0-d1-replicate-0: child 3 character innocent [2012-02-05 21:41:03.284290] D [afr-self-heal-common.c:738:afr_mark_sources] 0-d1-replicate-0: Number of sources: 0 [2012-02-05 21:41:03.284325] D [afr-self-heal-data.c:836:afr_lookup_select_read_child_by_txn_type] 0-d1-replicate-0: returning read_child: 2 [2012-02-05 21:41:03.284361] D [afr-common.c:1183:afr_lookup_select_read_child] 0-d1-replicate-0: Source selected as 2 for /data.txt [2012-02-05 21:41:03.284398] D [afr-common.c:990:afr_lookup_build_response_params] 0-d1-replicate-0: Building lookup response from 2 [2012-02-05 21:41:03.285622] D [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-2: Number of locks cleared=0 [2012-02-05 21:41:03.285690] D [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-1: Number of locks cleared=0 [2012-02-05 21:41:03.285779] D [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-3: Number of locks cleared=0 [2012-02-05 21:41:03.285880] D [client-lk.c:442:delete_granted_locks_fd] 0-d1-client-0: Number of locks cleared=0

===============CUT Client Log





_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to