Hi John, Greg, Zheng

And now a much more relevant problem. Once again, my environment:

- ceph/cephfs in 10.2.2 but patched for 
  o client: add missing client_lock for get_root 
(https://github.com/ceph/ceph/pull/10027)
  o Jewel: segfault in ObjectCacher::FlusherThread 
(http://tracker.ceph.com/issues/16610)
- All infrastructure is in the same version (rados cluster, mons, mds and 
cephfs clients).
- We mount cephfs using ceph-fuse.

Once we enabled quota in the clients (by using --client-quota), and exposed the 
filesystem to the workload of our users, we got systematic segfaults. We are 
able to reproduce them every time by asking the user to launch always the same 
workload . Please note that the segfault systematically happens in the clients 
where the quota is enabled. We still have a very few where it was not possible 
to remount cephfs (because they were being heavily used), and in those, no 
segfaults happen.

Running ceph-fuse in debug mode, and with 'debug client = 20', we got

     0> 2016-12-06 03:45:45.102680 7f40d3fff700 -1 client/Client.cc: In 
function 'Inode* Client::get_quota_root(Inode*)' thread 7f40d3fff700 time 
2016-12-06 03:45:45.101937
client/Client.cc: 12049: FAILED assert(root_ancestor->qtree == __null)

 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x8b) 
[0x7f410845671b]
 2: (Client::get_quota_root(Inode*)+0x7c5) [0x7f4108291315]
 3: (Client::check_quota_condition(Inode*, std::function<bool (Inode 
const&)>)+0x3d) [0x7f410829193d]
 4: (Client::is_quota_bytes_exceeded(Inode*, long)+0x6e) [0x7f4108291ade]
 5: (Client::_write(Fh*, long, unsigned long, char const*, iovec const*, 
int)+0xce3) [0x7f41082a7323]
 6: (Client::ll_write(Fh*, long, long, char const*)+0x94) [0x7f41082a87b4]
 7: (()+0x197b46) [0x7f4108262b46]
 8: (()+0x15294) [0x7f4107c82294]
 9: (()+0x15b76) [0x7f4107c82b76]
 10: (()+0x12aa9) [0x7f4107c7faa9]
 11: (()+0x3db6a07aa1) [0x7f4106db8aa1]
 12: (clone()+0x6d) [0x7f4106046aad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.

Just opened a new tracker: http://tracker.ceph.com/issues/18152

In the meantime, we will remove the --client-quota option.

Cheers
Goncalo

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to