Hi all

After expanding our cluster we are facing failures while rebalancing. In my 
opinion this doesn’t look good, so can anybody maybe explain how these failures 
could arise, how you can fix them or what the consequences can be?

$gluster volume rebalance public status
                                    Node         Rebalanced-files          size 
           scanned      failures       skipped               status            
run time in secs
                               ---------            -----------                 
    -----------        -----------   -----------         -----------         
------------               --------------
                               localhost                0                       
   0Bytes         49496         23464             0                in progress  
          3821.00
           gfs01b-dcg.intnet.be<http://gfs01b-dcg.intnet.be>                0   
                        0Bytes          49496             0                 0   
             in progress            3821.00
           gfs02a-dcg.intnet.be<http://gfs02a-dcg.intnet.be>                0   
                         0Bytes         49497             0                 0   
             in progress            3821.00
           gfs02b-dcg.intnet.be<http://gfs02b-dcg.intnet.be>                0   
                         0Bytes         49495             0                 0   
             in progress            3821.00

After looking in the public-rebalance.log this is one paragraph that shows up. 
The whole log is filled up with these.

[2015-09-15 14:50:58.239554] I [dht-common.c:3309:dht_setxattr] 0-public-dht: 
fixing the layout of /ka1hasselt/Lqw9pnXKV8ojBzzzsqHyChSU914422947204355
[2015-09-15 14:50:58.239730] I [dht-selfheal.c:960:dht_fix_layout_of_directory] 
0-public-dht: subvolume 0 (public-replicate-0): 251980 chunks
[2015-09-15 14:50:58.239750] I [dht-selfheal.c:960:dht_fix_layout_of_directory] 
0-public-dht: subvolume 1 (public-replicate-1): 251980 chunks
[2015-09-15 14:50:58.239759] I 
[dht-selfheal.c:1065:dht_selfheal_layout_new_directory] 0-public-dht: chunk 
size = 0xffffffff / 503960 = 0x214a
[2015-09-15 14:50:58.239784] I 
[dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 0-public-dht: assigning 
range size 0x7ffe51f8 to public-replicate-0
[2015-09-15 14:50:58.239791] I 
[dht-selfheal.c:1103:dht_selfheal_layout_new_directory] 0-public-dht: assigning 
range size 0x7ffe51f8 to public-replicate-1
[2015-09-15 14:50:58.239816] I [MSGID: 109036] 
[dht-common.c:6296:dht_log_new_layout_for_dir_selfheal] 0-public-dht: Setting 
layout of /ka1hasselt/Lqw9pnXKV8ojBzzzsqHyChSU914422947204355 with 
[Subvol_name: public-replicate-0, Err: -1 , Start: 0 , Stop: 2147373559 ], 
[Subvol_name: public-replicate-1, Err: -1 , Start: 2147373560 , Stop: 
4294967295 ],
[2015-09-15 14:50:58.306701] I [dht-rebalance.c:1405:gf_defrag_migrate_data] 
0-public-dht: migrate data called on 
/ka1hasselt/Lqw9pnXKV8ojBzzzsqHyChSU914422947204355
[2015-09-15 14:50:58.346531] W [client-rpc-fops.c:1090:client3_3_getxattr_cbk] 
0-public-client-2: remote operation failed: Permission denied. Path: 
/ka1hasselt/Lqw9pnXKV8ojBzzzsqHyChSU914422947204355/1.1 rationale getallen.pdf 
(ba5220be-a462-4008-ac67-79abb16f4dd9). Key: trusted.glusterfs.pathinfo
[2015-09-15 14:50:58.354111] W [client-rpc-fops.c:1090:client3_3_getxattr_cbk] 
0-public-client-3: remote operation failed: Permission denied. Path: 
/ka1hasselt/Lqw9pnXKV8ojBzzzsqHyChSU914422947204355/1.1 rationale getallen.pdf 
(ba5220be-a462-4008-ac67-79abb16f4dd9). Key: trusted.glusterfs.pathinfo
[2015-09-15 14:50:58.354166] E [dht-rebalance.c:1576:gf_defrag_migrate_data] 
0-public-dht: /ka1hasselt/Lqw9pnXKV8ojBzzzsqHyChSU914422947204355/1.1 rationale 
getallen.pdf: failed to get trusted.distribute.linkinfo key - Permission denied
[2015-09-15 14:50:58.356191] I [dht-rebalance.c:1649:gf_defrag_migrate_data] 
0-public-dht: Migration operation on dir 
/ka1hasselt/Lqw9pnXKV8ojBzzzsqHyChSU914422947204355 took 0.05 secs

Now the file which is referenced here, 1.1 rationale getallen.pdf, exists on 
the hosts referenced by 0-public-client-0 and 0-public-client-1 and not on the 
hosts referenced by 0-public-client-2 and 0-public-client-3. So another 
question what is the system really trying to do here and is this normal?

I really hope somebody could give me a deeper understanding about what is going 
on here.

Thanks in advance.

Kind regards
Davy
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to