Previosly we had multi-active MDS. But that time we got slow /stuck
requests when multiple clients accessing the cluster. So we decided to have
single active MDS and all others are stand by.
When we got this issue MDS trimming was going on. when we checked the last
ops
{
"ops": [
{
"description": "client_request(client.8784398:69290 readdir
#0x1000000cf10 2018-06-22 21:16:35.303754 caller_uid=0, caller_gid=0{0,})",
"initiated_at": "2018-06-22 21:16:35.319622",
"age": 1982.691792,
"duration": 1982.691821,
"type_data": {
"flag_point": "failed to authpin local pins",
"reqid": "client.8784398:69290",
"op_type": "client_request",
"client_info": {
"client": "client.8784398",
"tid": 69290
},
"events": [
{
"time": "2018-06-22 21:16:35.319622",
"event": "initiated"
},
{
"time": "2018-06-22 21:16:35.319998",
"event": "failed to authpin local pins"
}
]
}
},
All the requests come to the server got hang and deadlock situation
occurred . We restarted MDS which hung, then everything became normal. Not
able to find the reason, but we saw some post that multi active MDS is
still not stable. So we changed it to single
Regards
Surya
On Tue, Jul 17, 2018 at 3:15 PM, Daniel Baumann <[email protected]>
wrote:
> On 07/17/2018 11:43 AM, Marc Roos wrote:
> > I had similar thing with doing the ls. Increasing the cache limit helped
> > with our test cluster
>
> same here; additionally we also had to use more than one MDS to get good
> performance (currently 3 MDS plus 2 stand-by per FS).
>
> Regards,
> Daniel
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com