You probably just want to increase the ulimit settings. You can change the
OSD setting, but that only covers file descriptors against the backing
store, not sockets for network communication -- the latter is more often
the one that runs out.
-Greg

On Thursday, June 12, 2014, Christian Kauhaus <k...@gocept.com
<javascript:_e(%7B%7D,'cvml','k...@gocept.com');>> wrote:

> Hi,
>
> we have a Ceph cluster with 32 OSDs running on 4 servers (8 OSDs per
> server,
> one for each disk).
>
> From time to time, I see Ceph servers running out of file descriptors. It
> logs
> lines like:
>
> > 2014-06-08 22:15:35.154759 7f850ac25700  0
> filestore(/srv/ceph/osd/ceph-20)
> write couldn't open
> 86.37_head/a63e7df7/rbd_data.1933fe2ae8944a.000000000000042c/head//86: (24)
> Too many open files
> > 2014-06-08 22:15:35.255955 7f850ac25700 -1 os/FileStore.cc: In function
> 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&,
> uint64_t,
> int, ThreadPool::TPHandle*)' thread 7f850ac25700 time
> > 2014-06-08 22:15:35.191181 os/FileStore.cc: 2448: FAILED assert(0 ==
> "unexpected error")
>
> but apparently everything proceeds normally after that.
>
> Is the error considered critical? Should I lower "max open files" in
> ceph.conf? Or should I increase the value in /proc/sys/fs/file-max? Has
> anyone
> a good recommendation?
>
> TIA
>
> Christian
>
>
> Reference:
>
> * we are running Ceph Emperor 0.72.2 on Linux 3.10.7.
>
> * full log follows:
>
> 2014-06-08 22:15:34.928660 7f84e6770700  0 <cls> cls/lock/cls_lock.cc:89:
> error reading xattr lock.rbd_lock: -24
> 2014-06-08 22:15:34.934733 7f84e6770700  0 <cls> cls/lock/cls_lock.cc:384:
> Could not read lock info: Unknown error -24
> 2014-06-08 22:15:35.085361 7f84ecf7d700  0 accepter.accepter no incoming
> connection?  sd = -1 errno 24 Too many open files
> 2014-06-08 22:15:35.125393 7f84ecf7d700  0 accepter.accepter no incoming
> connection?  sd = -1 errno 24 Too many open files
> 2014-06-08 22:15:35.125403 7f84ecf7d700  0 accepter.accepter no incoming
> connection?  sd = -1 errno 24 Too many open files
> 2014-06-08 22:15:35.125407 7f84ecf7d700  0 accepter.accepter no incoming
> connection?  sd = -1 errno 24 Too many open files
> 2014-06-08 22:15:35.125410 7f84ecf7d700  0 accepter.accepter no incoming
> connection?  sd = -1 errno 24 Too many open files
> 2014-06-08 22:15:35.154759 7f850ac25700  0 filestore(/srv/ceph/osd/ceph-20)
> write couldn't open
> 86.37_head/a63e7df7/rbd_data.1933fe2ae8944a.000000000000042c/head//86: (24)
> Too many open files
> 2014-06-08 22:15:35.159074 7f850ac25700  0 filestore(/srv/ceph/osd/ceph-20)
> error (24) Too many open files not handled on operation 10 (488954466.1.0,
> or
> op 0, counting from 0)
> 2014-06-08 22:15:35.159095 7f850ac25700  0 filestore(/srv/ceph/osd/ceph-20)
> unexpected error code
> 2014-06-08 22:15:35.159098 7f850ac25700  0 filestore(/srv/ceph/osd/ceph-20)
> transaction dump:
> { "ops": [
>         { "op_num": 0,
>           "op_name": "write",
>           "collection": "86.37_head",
>           "oid":
> "a63e7df7\/rbd_data.1933fe2ae8944a.000000000000042c\/head\/\/86",
>           "length": 4096,
>           "offset": 3104768,
>           "bufferlist length": 4096},
>         { "op_num": 1,
>           "op_name": "setattr",
>           "collection": "86.37_head",
>           "oid":
> "a63e7df7\/rbd_data.1933fe2ae8944a.000000000000042c\/head\/\/86",
>           "name": "_",
>           "length": 251},
>         { "op_num": 2,
>           "op_name": "setattr",
>           "collection": "86.37_head",
>           "oid":
> "a63e7df7\/rbd_data.1933fe2ae8944a.000000000000042c\/head\/\/86",
>           "name": "snapset",
>           "length": 31}]}
> 2014-06-08 22:15:35.255955 7f850ac25700 -1 os/FileStore.cc: In function
> 'unsigned int FileStore::_do_transaction(ObjectStore::Transaction&,
> uint64_t,
> int, ThreadPool::TPHandle*)' thread 7f850ac25700 time
> 2014-06-08 22:15:35.191181 os/FileStore.cc: 2448: FAILED assert(0 ==
> "unexpected error")
>
> --
> Dipl.-Inf. Christian Kauhaus <>< · k...@gocept.com · systems administration
> gocept gmbh & co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
> http://gocept.com · tel +49 345 219401-11
> Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to