Hi,

we have a Ceph cluster with 32 OSDs running on 4 servers (8 OSDs per server,
one for each disk).

From time to time, I see Ceph servers running out of file descriptors. It logs
lines like:

> 2014-06-08 22:15:35.154759 7f850ac25700  0 filestore(/srv/ceph/osd/ceph-20)
write couldn't open
86.37_head/a63e7df7/rbd_data.1933fe2ae8944a.000000000000042c/head//86: (24)
Too many open files
> 2014-06-08 22:15:35.255955 7f850ac25700 -1 os/FileStore.cc: In function
'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t,
int, ThreadPool::TPHandle*)' thread 7f850ac25700 time
> 2014-06-08 22:15:35.191181 os/FileStore.cc: 2448: FAILED assert(0 ==
"unexpected error")

but apparently everything proceeds normally after that.

Is the error considered critical? Should I lower "max open files" in
ceph.conf? Or should I increase the value in /proc/sys/fs/file-max? Has anyone
a good recommendation?

TIA

Christian


Reference:

* we are running Ceph Emperor 0.72.2 on Linux 3.10.7.

* full log follows:

2014-06-08 22:15:34.928660 7f84e6770700  0 <cls> cls/lock/cls_lock.cc:89:
error reading xattr lock.rbd_lock: -24
2014-06-08 22:15:34.934733 7f84e6770700  0 <cls> cls/lock/cls_lock.cc:384:
Could not read lock info: Unknown error -24
2014-06-08 22:15:35.085361 7f84ecf7d700  0 accepter.accepter no incoming
connection?  sd = -1 errno 24 Too many open files
2014-06-08 22:15:35.125393 7f84ecf7d700  0 accepter.accepter no incoming
connection?  sd = -1 errno 24 Too many open files
2014-06-08 22:15:35.125403 7f84ecf7d700  0 accepter.accepter no incoming
connection?  sd = -1 errno 24 Too many open files
2014-06-08 22:15:35.125407 7f84ecf7d700  0 accepter.accepter no incoming
connection?  sd = -1 errno 24 Too many open files
2014-06-08 22:15:35.125410 7f84ecf7d700  0 accepter.accepter no incoming
connection?  sd = -1 errno 24 Too many open files
2014-06-08 22:15:35.154759 7f850ac25700  0 filestore(/srv/ceph/osd/ceph-20)
write couldn't open
86.37_head/a63e7df7/rbd_data.1933fe2ae8944a.000000000000042c/head//86: (24)
Too many open files
2014-06-08 22:15:35.159074 7f850ac25700  0 filestore(/srv/ceph/osd/ceph-20)
error (24) Too many open files not handled on operation 10 (488954466.1.0, or
op 0, counting from 0)
2014-06-08 22:15:35.159095 7f850ac25700  0 filestore(/srv/ceph/osd/ceph-20)
unexpected error code
2014-06-08 22:15:35.159098 7f850ac25700  0 filestore(/srv/ceph/osd/ceph-20)
transaction dump:
{ "ops": [
        { "op_num": 0,
          "op_name": "write",
          "collection": "86.37_head",
          "oid": 
"a63e7df7\/rbd_data.1933fe2ae8944a.000000000000042c\/head\/\/86",
          "length": 4096,
          "offset": 3104768,
          "bufferlist length": 4096},
        { "op_num": 1,
          "op_name": "setattr",
          "collection": "86.37_head",
          "oid": 
"a63e7df7\/rbd_data.1933fe2ae8944a.000000000000042c\/head\/\/86",
          "name": "_",
          "length": 251},
        { "op_num": 2,
          "op_name": "setattr",
          "collection": "86.37_head",
          "oid": 
"a63e7df7\/rbd_data.1933fe2ae8944a.000000000000042c\/head\/\/86",
          "name": "snapset",
          "length": 31}]}
2014-06-08 22:15:35.255955 7f850ac25700 -1 os/FileStore.cc: In function
'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&, uint64_t,
int, ThreadPool::TPHandle*)' thread 7f850ac25700 time
2014-06-08 22:15:35.191181 os/FileStore.cc: 2448: FAILED assert(0 ==
"unexpected error")

-- 
Dipl.-Inf. Christian Kauhaus <>< · [email protected] · systems administration
gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
http://gocept.com · tel +49 345 219401-11
Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to