Hi,
we have a Ceph cluster with 32 OSDs running on 4 servers (8 OSDs per server,
one for each disk).
From time to time, I see Ceph servers running out of file descriptors. It logs
lines like:
> 2014-06-08 22:15:35.154759 7f850ac25700 0 filestore(/srv/ceph/osd/ceph-20)
write couldn't open
86.37_head/a63e7df7/rbd_data.1933fe2ae8944a.000000000000042c/head//86: (24)
Too many open files
> 2014-06-08 22:15:35.255955 7f850ac25700 -1 os/FileStore.cc: In function
'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t,
int, ThreadPool::TPHandle*)' thread 7f850ac25700 time
> 2014-06-08 22:15:35.191181 os/FileStore.cc: 2448: FAILED assert(0 ==
"unexpected error")
but apparently everything proceeds normally after that.
Is the error considered critical? Should I lower "max open files" in
ceph.conf? Or should I increase the value in /proc/sys/fs/file-max? Has anyone
a good recommendation?
TIA
Christian
Reference:
* we are running Ceph Emperor 0.72.2 on Linux 3.10.7.
* full log follows:
2014-06-08 22:15:34.928660 7f84e6770700 0 <cls> cls/lock/cls_lock.cc:89:
error reading xattr lock.rbd_lock: -24
2014-06-08 22:15:34.934733 7f84e6770700 0 <cls> cls/lock/cls_lock.cc:384:
Could not read lock info: Unknown error -24
2014-06-08 22:15:35.085361 7f84ecf7d700 0 accepter.accepter no incoming
connection? sd = -1 errno 24 Too many open files
2014-06-08 22:15:35.125393 7f84ecf7d700 0 accepter.accepter no incoming
connection? sd = -1 errno 24 Too many open files
2014-06-08 22:15:35.125403 7f84ecf7d700 0 accepter.accepter no incoming
connection? sd = -1 errno 24 Too many open files
2014-06-08 22:15:35.125407 7f84ecf7d700 0 accepter.accepter no incoming
connection? sd = -1 errno 24 Too many open files
2014-06-08 22:15:35.125410 7f84ecf7d700 0 accepter.accepter no incoming
connection? sd = -1 errno 24 Too many open files
2014-06-08 22:15:35.154759 7f850ac25700 0 filestore(/srv/ceph/osd/ceph-20)
write couldn't open
86.37_head/a63e7df7/rbd_data.1933fe2ae8944a.000000000000042c/head//86: (24)
Too many open files
2014-06-08 22:15:35.159074 7f850ac25700 0 filestore(/srv/ceph/osd/ceph-20)
error (24) Too many open files not handled on operation 10 (488954466.1.0, or
op 0, counting from 0)
2014-06-08 22:15:35.159095 7f850ac25700 0 filestore(/srv/ceph/osd/ceph-20)
unexpected error code
2014-06-08 22:15:35.159098 7f850ac25700 0 filestore(/srv/ceph/osd/ceph-20)
transaction dump:
{ "ops": [
{ "op_num": 0,
"op_name": "write",
"collection": "86.37_head",
"oid":
"a63e7df7\/rbd_data.1933fe2ae8944a.000000000000042c\/head\/\/86",
"length": 4096,
"offset": 3104768,
"bufferlist length": 4096},
{ "op_num": 1,
"op_name": "setattr",
"collection": "86.37_head",
"oid":
"a63e7df7\/rbd_data.1933fe2ae8944a.000000000000042c\/head\/\/86",
"name": "_",
"length": 251},
{ "op_num": 2,
"op_name": "setattr",
"collection": "86.37_head",
"oid":
"a63e7df7\/rbd_data.1933fe2ae8944a.000000000000042c\/head\/\/86",
"name": "snapset",
"length": 31}]}
2014-06-08 22:15:35.255955 7f850ac25700 -1 os/FileStore.cc: In function
'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t,
int, ThreadPool::TPHandle*)' thread 7f850ac25700 time
2014-06-08 22:15:35.191181 os/FileStore.cc: 2448: FAILED assert(0 ==
"unexpected error")
--
Dipl.-Inf. Christian Kauhaus <>< · [email protected] · systems administration
gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
http://gocept.com · tel +49 345 219401-11
Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com