You have a socket appender which blocks, and this stalls ES. Maybe you use TCP and not UDP. UDB can not block.
This has been improved in log4j2 where socketappender can be configured as an async appender which never blocks, even with TCP. Check if you can switch to log4j2: http://logging.apache.org/log4j/2.x/manual/appenders.html Jörg socketappender: type: org.apache.log4j.net.SocketAppender port: 9500 remoteHost: localhost layout: type: pattern conversionPattern: "[%d{ISO8601}][%-5p][%-25c] %m%n" On Thu, Sep 25, 2014 at 6:05 PM, Chris Denneen <[email protected]> wrote: > Jörg, > > I've updated gist (https://gist.github.com/cdenneen/70049c77fa5fc547428e) > with logging.yml > > And NC shows 9500 as open... rest are just local files: > > [root@rndeslogs1 elasticsearch]# nc -z 127.0.0.1 9500 > Connection to 127.0.0.1 9500 port [tcp/ismserver] succeeded! > [root@rndeslogs1 elasticsearch]# nc -z localhost 9500 > Connection to localhost 9500 port [tcp/ismserver] succeeded! > > -Chris > > On Thursday, September 25, 2014 11:54:56 AM UTC-4, Jörg Prante wrote: >> >> Check your log4j appenders. They block and ES can't continue. >> >> Jörg >> >> On Thu, Sep 25, 2014 at 5:05 PM, Chris Denneen <[email protected]> wrote: >> >>> Is there anymore info I can provide for someone to help here, I'm not >>> sure what to do other than restart ES but that isn't a good long term >>> solution every day or so? >>> >>> [root@rndeslogs1 elasticsearch]# curl -q localhost:9200/_cluster/health >>> | python -mjson.tool >>> % Total % Received % Xferd Average Speed Time Time Time >>> Current >>> Dload Upload Total Spent Left >>> Speed >>> 116 233 116 233 0 0 10457 0 --:--:-- --:--:-- --:--:-- >>> 13705 >>> { >>> "active_primary_shards": 136, >>> "active_shards": 136, >>> "cluster_name": "logstash-cluster", >>> "initializing_shards": 0, >>> "number_of_data_nodes": 1, >>> "number_of_nodes": 2, >>> "relocating_shards": 0, >>> "status": "yellow", *This is because I have marvel installed and >>> only one data node but otherwise everything is green... when I DELETE >>> .marvel* indices cluster shows as "green" but because right now I can't >>> DELETE, CLOSE, POST data to cluster it's showing as yellow* >>> "timed_out": false, >>> "unassigned_shards": 12 >>> } >>> >>> On Wednesday, September 24, 2014 6:16:51 PM UTC-4, Chris Denneen wrote: >>>> >>>> If anyone can help me understand why my cluster is hung I would >>>> appreciate it. >>>> >>>> jstack output: >>>> >>>> https://gist.github.com/anonymous/075c862cb211ae249707 >>>> >>>> I am able to query the cluster and health is good but I can't DELETE or >>>> CLOSE index as it is unresponsive. >>>> >>>> mlockall is set to true >>>> >>>> iostat: >>>> >>>> avg-cpu: %user %nice %system %iowait %steal %idle >>>> 2.00 0.05 0.30 0.08 0.00 97.57 >>>> >>>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >>>> sdb 7.40 0.00 939.20 0 4696 >>>> sda 0.40 0.00 4.80 0 24 >>>> dm-0 0.60 0.00 4.80 0 24 >>>> dm-1 0.00 0.00 0.00 0 0 >>>> dm-2 117.40 0.00 939.20 0 4696 >>>> >>>> avg-cpu: %user %nice %system %iowait %steal %idle >>>> 2.93 0.03 0.23 0.08 0.00 96.74 >>>> >>>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >>>> sdb 6.80 0.00 776.00 0 3880 >>>> sda 0.80 0.00 20.80 0 104 >>>> dm-0 2.60 0.00 20.80 0 104 >>>> dm-1 0.00 0.00 0.00 0 0 >>>> dm-2 97.00 0.00 776.00 0 3880 >>>> >>>> avg-cpu: %user %nice %system %iowait %steal %idle >>>> 1.20 0.03 0.25 0.10 0.00 98.42 >>>> >>>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >>>> sdb 11.40 0.00 1312.00 0 6560 >>>> sda 0.80 0.00 22.40 0 112 >>>> dm-0 2.80 0.00 22.40 0 112 >>>> dm-1 0.00 0.00 0.00 0 0 >>>> dm-2 164.00 0.00 1312.00 0 6560 >>>> >>>> avg-cpu: %user %nice %system %iowait %steal %idle >>>> 7.07 0.03 0.50 0.08 0.00 92.33 >>>> >>>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >>>> sdb 20.40 0.00 5064.00 0 25320 >>>> sda 1.00 0.00 25.60 0 128 >>>> dm-0 3.20 0.00 25.60 0 128 >>>> dm-1 0.00 0.00 0.00 0 0 >>>> dm-2 633.00 0.00 5064.00 0 25320 >>>> >>>> avg-cpu: %user %nice %system %iowait %steal %idle >>>> 1.23 0.05 0.33 0.10 0.00 98.30 >>>> >>>> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >>>> sdb 15.20 0.00 2604.80 0 13024 >>>> sda 2.40 0.00 38.40 0 192 >>>> dm-0 4.80 0.00 38.40 0 192 >>>> dm-1 0.00 0.00 0.00 0 0 >>>> dm-2 325.60 0.00 2604.80 0 13024 >>>> >>>> >>>> vmstat: >>>> >>>> -bash-4.1$ vmstat 5 >>>> procs -----------memory---------- ---swap-- -----io---- --system-- >>>> -----cpu----- >>>> r b swpd free buff cache si so bi bo in cs us sy >>>> id wa st >>>> 0 0 0 141532 163140 1955776 0 0 19 80 2 0 2 >>>> 0 96 2 0 >>>> 0 0 0 140664 163156 1956428 0 0 0 801 776 719 3 >>>> 0 97 0 0 >>>> 0 0 0 138880 163164 1958264 0 0 0 776 770 765 2 >>>> 0 98 0 0 >>>> 0 0 0 133820 163192 1963364 0 0 0 1570 1174 825 4 >>>> 0 95 0 0 >>>> 1 0 0 129984 163200 1967036 0 0 0 1422 1026 836 4 >>>> 0 95 0 0 >>>> >>>> >>>> -bash-4.1$ lsof -u elasticsearch | wc -l >>>> 3004 >>>> >>>> >>>> /etc/security/limits.conf:elasticsearch hard nofile 65536 >>>> /etc/security/limits.conf:elasticsearch soft nofile 65536 >>>> /etc/security/limits.conf:elasticsearch - memlock unlimited >>>> >>>> >>>> >>>> top - 18:15:25 up 18 days, 14:36, 1 user, load average: 0.23, 0.32, >>>> 0.32 >>>> Tasks: 190 total, 1 running, 189 sleeping, 0 stopped, 0 zombie >>>> Cpu(s): 0.5%us, 0.2%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, >>>> 0.0%st >>>> Mem: 8060812k total, 7928472k used, 132340k free, 164384k buffers >>>> Swap: 0k total, 0k used, 0k free, 1963024k cached >>>> >>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>>> 26117 elastics 20 0 55.0g 5.2g 327m S 4.3 68.1 1836:21 java >>>> 1358 logstash 39 19 5078m 257m 11m S 0.7 3.3 183:28.43 java >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/77017349-b637-450f-8923-7e27c8bfa8d0% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/77017349-b637-450f-8923-7e27c8bfa8d0%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/50b8e6ef-8c32-4f66-919d-19bfd3cd4a43%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/50b8e6ef-8c32-4f66-919d-19bfd3cd4a43%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEjEGWxD2hYRcHuaF0zCfXNC-0wAGpNGDG43_OS5mUYgg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
