On Thu, May 22, 2014 at 4:09 AM, Kenneth Waegeman
<[email protected]> wrote:
>
> ----- Message from Gregory Farnum <[email protected]> ---------
> Date: Wed, 21 May 2014 15:46:17 -0700
>
> From: Gregory Farnum <[email protected]>
> Subject: Re: [ceph-users] Expanding pg's of an erasure coded pool
> To: Kenneth Waegeman <[email protected]>
> Cc: ceph-users <[email protected]>
>
>
>> On Wed, May 21, 2014 at 3:52 AM, Kenneth Waegeman
>> <[email protected]> wrote:
>>>
>>> Thanks! I increased the max processes parameter for all daemons quite a
>>> lot
>>> (until ulimit -u 3802720)
>>>
>>> These are the limits for the daemons now..
>>> [root@ ~]# cat /proc/17006/limits
>>> Limit Soft Limit Hard Limit Units
>>> Max cpu time unlimited unlimited
>>> seconds
>>> Max file size unlimited unlimited bytes
>>> Max data size unlimited unlimited bytes
>>> Max stack size 10485760 unlimited bytes
>>> Max core file size unlimited unlimited bytes
>>> Max resident set unlimited unlimited bytes
>>> Max processes 3802720 3802720
>>> processes
>>> Max open files 32768 32768 files
>>> Max locked memory 65536 65536 bytes
>>> Max address space unlimited unlimited bytes
>>> Max file locks unlimited unlimited locks
>>> Max pending signals 95068 95068
>>> signals
>>> Max msgqueue size 819200 819200 bytes
>>> Max nice priority 0 0
>>> Max realtime priority 0 0
>>> Max realtime timeout unlimited unlimited us
>>>
>>> But this didn't help. Are there other parameters I should change?
>>
>>
>> Hrm, is it exactly the same stack trace? You might need to bump the
>> open files limit as well, although I'd be surprised. :/
>
>
> I increased the open file limit as test to 128000, still the same results.
>
> Stack trace:
<snip>
> But I see some things happening on the system while doing this too:
>
>
>
> [root@ ~]# ceph osd pool set ecdata15 pgp_num 4096
> set pool 16 pgp_num to 4096
> [root@ ~]# ceph status
> Traceback (most recent call last):
> File "/usr/bin/ceph", line 830, in <module>
> sys.exit(main())
> File "/usr/bin/ceph", line 590, in main
> conffile=conffile)
> File "/usr/lib/python2.6/site-packages/rados.py", line 198, in __init__
> librados_path = find_library('rados')
> File "/usr/lib64/python2.6/ctypes/util.py", line 209, in find_library
> return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
> File "/usr/lib64/python2.6/ctypes/util.py", line 203, in
> _findSoname_ldconfig
> os.popen('LANG=C /sbin/ldconfig -p 2>/dev/null').read())
> OSError: [Errno 12] Cannot allocate memory
> [root@ ~]# lsof | wc
> -bash: fork: Cannot allocate memory
> [root@ ~]# lsof | wc
> 21801 211209 3230028
> [root@ ~]# ceph status
> ^CError connecting to cluster: InterruptedOrTimeoutError
> ^[[A[root@ ~]# lsof | wc
> 2028 17476 190947
>
>
>
> And meanwhile the daemons has then been crashed.
>
> I verified the memory never ran out.
Is there anything in dmesg? It sure looks like the OS thinks it's run
out of memory one way or another.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com