Re: elasticsearch blocked futex

[email protected] Tue, 30 Sep 2014 12:06:04 -0700

You have a socket appender which blocks, and this stalls ES.

Maybe you use TCP and not UDP. UDB can not block.


This has been improved in log4j2 where socketappender can be
configured as an async appender which never blocks, even with TCP.

Check if you can switch to log4j2:

http://logging.apache.org/log4j/2.x/manual/appenders.html

Jörg

  socketappender:
    type: org.apache.log4j.net.SocketAppender
    port: 9500
    remoteHost: localhost
    layout:
      type: pattern
      conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n"




On Thu, Sep 25, 2014 at 6:05 PM, Chris Denneen <[email protected]> wrote:

> Jörg,
>
> I've updated gist (https://gist.github.com/cdenneen/70049c77fa5fc547428e)
> with logging.yml
>
> And NC shows 9500 as open... rest are just local files:
>
> [root@rndeslogs1 elasticsearch]# nc -z 127.0.0.1 9500
> Connection to 127.0.0.1 9500 port [tcp/ismserver] succeeded!
> [root@rndeslogs1 elasticsearch]# nc -z localhost 9500
> Connection to localhost 9500 port [tcp/ismserver] succeeded!
>
> -Chris
>
> On Thursday, September 25, 2014 11:54:56 AM UTC-4, Jörg Prante wrote:
>>
>> Check your log4j appenders. They block and ES can't continue.
>>
>> Jörg
>>
>> On Thu, Sep 25, 2014 at 5:05 PM, Chris Denneen <[email protected]> wrote:
>>
>>> Is there anymore info I can provide for someone to help here, I'm not
>>> sure what to do other than restart ES but that isn't a good long term
>>> solution every day or so?
>>>
>>> [root@rndeslogs1 elasticsearch]# curl -q localhost:9200/_cluster/health
>>> | python -mjson.tool
>>>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
>>>  Current
>>>                                  Dload  Upload   Total   Spent    Left
>>>  Speed
>>> 116   233  116   233    0     0  10457      0 --:--:-- --:--:-- --:--:--
>>> 13705
>>> {
>>>     "active_primary_shards": 136,
>>>     "active_shards": 136,
>>>     "cluster_name": "logstash-cluster",
>>>     "initializing_shards": 0,
>>>     "number_of_data_nodes": 1,
>>>     "number_of_nodes": 2,
>>>     "relocating_shards": 0,
>>>     "status": "yellow", *This is because I have marvel installed and
>>> only one data node but otherwise everything is green... when I DELETE
>>> .marvel* indices cluster shows as "green" but because right now I can't
>>> DELETE, CLOSE, POST data to cluster it's showing as yellow*
>>>     "timed_out": false,
>>>     "unassigned_shards": 12
>>> }
>>>
>>> On Wednesday, September 24, 2014 6:16:51 PM UTC-4, Chris Denneen wrote:
>>>>
>>>> If anyone can help me understand why my cluster is hung I would
>>>> appreciate it.
>>>>
>>>> jstack output:
>>>>
>>>> https://gist.github.com/anonymous/075c862cb211ae249707
>>>>
>>>> I am able to query the cluster and health is good but I can't DELETE or
>>>> CLOSE index as it is unresponsive.
>>>>
>>>> mlockall is set to true
>>>>
>>>> iostat:
>>>>
>>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>>            2.00    0.05    0.30    0.08    0.00   97.57
>>>>
>>>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>>>> sdb               7.40         0.00       939.20          0       4696
>>>> sda               0.40         0.00         4.80          0         24
>>>> dm-0              0.60         0.00         4.80          0         24
>>>> dm-1              0.00         0.00         0.00          0          0
>>>> dm-2            117.40         0.00       939.20          0       4696
>>>>
>>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>>            2.93    0.03    0.23    0.08    0.00   96.74
>>>>
>>>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>>>> sdb               6.80         0.00       776.00          0       3880
>>>> sda               0.80         0.00        20.80          0        104
>>>> dm-0              2.60         0.00        20.80          0        104
>>>> dm-1              0.00         0.00         0.00          0          0
>>>> dm-2             97.00         0.00       776.00          0       3880
>>>>
>>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>>            1.20    0.03    0.25    0.10    0.00   98.42
>>>>
>>>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>>>> sdb              11.40         0.00      1312.00          0       6560
>>>> sda               0.80         0.00        22.40          0        112
>>>> dm-0              2.80         0.00        22.40          0        112
>>>> dm-1              0.00         0.00         0.00          0          0
>>>> dm-2            164.00         0.00      1312.00          0       6560
>>>>
>>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>>            7.07    0.03    0.50    0.08    0.00   92.33
>>>>
>>>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>>>> sdb              20.40         0.00      5064.00          0      25320
>>>> sda               1.00         0.00        25.60          0        128
>>>> dm-0              3.20         0.00        25.60          0        128
>>>> dm-1              0.00         0.00         0.00          0          0
>>>> dm-2            633.00         0.00      5064.00          0      25320
>>>>
>>>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>>>            1.23    0.05    0.33    0.10    0.00   98.30
>>>>
>>>> Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>>>> sdb              15.20         0.00      2604.80          0      13024
>>>> sda               2.40         0.00        38.40          0        192
>>>> dm-0              4.80         0.00        38.40          0        192
>>>> dm-1              0.00         0.00         0.00          0          0
>>>> dm-2            325.60         0.00      2604.80          0      13024
>>>>
>>>>
>>>> vmstat:
>>>>
>>>> -bash-4.1$ vmstat 5
>>>> procs -----------memory---------- ---swap-- -----io---- --system--
>>>> -----cpu-----
>>>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
>>>> id wa st
>>>>  0  0      0 141532 163140 1955776    0    0    19    80    2    0  2
>>>>  0 96  2  0
>>>>  0  0      0 140664 163156 1956428    0    0     0   801  776  719  3
>>>>  0 97  0  0
>>>>  0  0      0 138880 163164 1958264    0    0     0   776  770  765  2
>>>>  0 98  0  0
>>>>  0  0      0 133820 163192 1963364    0    0     0  1570 1174  825  4
>>>>  0 95  0  0
>>>>  1  0      0 129984 163200 1967036    0    0     0  1422 1026  836  4
>>>>  0 95  0  0
>>>>
>>>>
>>>> -bash-4.1$ lsof -u elasticsearch | wc -l
>>>> 3004
>>>>
>>>>
>>>> /etc/security/limits.conf:elasticsearch hard nofile 65536
>>>> /etc/security/limits.conf:elasticsearch soft nofile 65536
>>>> /etc/security/limits.conf:elasticsearch - memlock unlimited
>>>>
>>>>
>>>>
>>>> top - 18:15:25 up 18 days, 14:36,  1 user,  load average: 0.23, 0.32,
>>>> 0.32
>>>> Tasks: 190 total,   1 running, 189 sleeping,   0 stopped,   0 zombie
>>>> Cpu(s):  0.5%us,  0.2%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,
>>>>  0.0%st
>>>> Mem:   8060812k total,  7928472k used,   132340k free,   164384k buffers
>>>> Swap:        0k total,        0k used,        0k free,  1963024k cached
>>>>
>>>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>>> 26117 elastics  20   0 55.0g 5.2g 327m S  4.3 68.1   1836:21 java
>>>>  1358 logstash  39  19 5078m 257m  11m S  0.7  3.3 183:28.43 java
>>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/77017349-b637-450f-8923-7e27c8bfa8d0%
>>> 40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/77017349-b637-450f-8923-7e27c8bfa8d0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/50b8e6ef-8c32-4f66-919d-19bfd3cd4a43%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/50b8e6ef-8c32-4f66-919d-19bfd3cd4a43%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEjEGWxD2hYRcHuaF0zCfXNC-0wAGpNGDG43_OS5mUYgg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: elasticsearch blocked futex

Reply via email to