Re: jdk fails with out of memory error / es critical index counts

Nate Fox Mon, 05 May 2014 13:26:17 -0700

Get node2 running with rock. Then issue a disable_allocation and then bring
up node1.
curl -XPUT localhost:9200/_cluster/settings -d
'{"transient":{"cluster.routing.allocation.disable_allocation":true}}'


>From there, adjust the replica settings on the indexes down to 0 so they
dont copy. Once thats set, change disable_allocation to false.







On Mon, May 5, 2014 at 1:19 PM, Nish <[email protected]> wrote:

> *.."- Fire up both nodes, make sure they both have the same cluster name"*<= 
> This is exactly what I wrote in my second message is where Elasticsearch
> is messing up. When I move the index to a new node and delete that index
> from master and then start master node and other data node, it (master)
> throws a message:
> "auto importing dangled indices"
> This means master is now copying the "deleted" index that exists only on
> other node to itself !
>
>
> Basically this is what happens:
>
>    1. Node1 Master: rock,paper,scissors
>    2. I move rock from Node 1 to Node 2 (I verify by starting ONLY node1
>    and I can see that I am missing data that was originally in "rock" index,
>    as expected, all good)
>    3. SO node1 now has paper,scissors
>    4. I start Node2 with ONLY "rock" index (verify independently, it
>    works)
>    5. Then I start node 1 (master) and node 2(data)
>    6. Node1 sees says "hey I don't have rock, but node2 has it, let me
>    copy it to myself"
>
>
>
> On Monday, May 5, 2014 3:44:17 PM UTC-4, Nate Fox wrote:
>
>> You might turn off the bootstrap.mlockall flag just for now - it'll make
>> ES swap a ton, but your error message looks like an OS level issue. Make
>> sure you have lots of swap available and grab some coffee.
>>
>> What I'd also try if turning off bootstrap.mlockall doesnt work:
>> - Tarball the entire data directory and save the tarball somewhere
>> (unless you dont care about the data)
>> - Set 31Gb for your ES HEAP. There's plenty of docs out there that say
>> not to go over 32Gb of ram cause it'll cause Java to go into 64bit mode.
>> - Copy the entire data dir to node2
>> - Go into the data dir on node1 and delete half of the indexes
>> - Go into the data dir on node2 and delete the *other* half of the indexes
>> - Fire up both nodes, make sure they both have the same cluster name
>>
>> I have no idea if this'll work, I'm by no means an ES expert. :)
>>
>>
>>
>>
>>
>>
>> On Mon, May 5, 2014 at 12:32 PM, Nish <[email protected]> wrote:
>>
>>> Currently I have 279 indexes on a single node and elasticsearch starts
>>> for few minutes and dies ; I only have 60G RAM on disk and as far as I know
>>> 60% is the max that one should allocate to elasticsearch ; I tried
>>> allocating 38G and it lasted for few more minutes and it died.
>>>
>>> *(I think there's some state files that tell ES/Lucene which indexes are
>>> on disk)* => Where is this ? How do I fix it so that it doesn't move
>>> all indexes to all nodes ? I want to split the ~280 indexes into two nodes
>>> of 140each. So far I am not able to achieve this as the master keeps moving
>>> nodes to itself !
>>>
>>> On Monday, May 5, 2014 3:25:05 PM UTC-4, Nate Fox wrote:
>>>>
>>>> How many indexes do you have? It almost looks like the system itself
>>>> cant allocate the ram needed?
>>>> You might try jacking up the nofile to something like 999999 as well?
>>>> I'd definitely go with 31g heapsize.
>>>>
>>>> As for moving indexes, you might be able to copy the entire data store,
>>>> then remove some (I think there's some state files that tell ES/Lucene
>>>> which indexes are on disk), so it might recover if its missing some and
>>>> sees the others on another node?
>>>>
>>>> As for your other questions, it depends on usage as to how many nodes -
>>>> especially search activity while indexing. We have 230 indexes (1740
>>>> shards) on 8 data nodes (5.7Tb / 6.1B docs). So it can definitely handle a
>>>> lot more than what you're throwing at it. We dont search often nor do we
>>>> load a ton of data at once.
>>>>
>>>>
>>>> On Sunday, May 4, 2014 7:13:09 AM UTC-7, Nish wrote:
>>>>>
>>>>> elasticsearch is set as a single node instance on a  60G RAM and
>>>>> 32*2.6GHz machine. I am actively indexing historic data with logstash. It
>>>>> worked well with ~300 million documents (search and indexing were doing 
>>>>> ok)
>>>>> , but all of a sudden es fails to starts and keep itself up. It starts for
>>>>> few minutes and I can query but fails with out of memory error. I monitor
>>>>> the memory and atleast 12G of memory is available when it fails. I had set
>>>>> the es_heap_size to 31G and then reduced it to 28, 24 and 18 and the same
>>>>> error every time (see dump below)
>>>>>
>>>>> *My security limits are as under  (this is a test/POC server thus
>>>>> "root" user) *
>>>>>
>>>>> root   soft    nofile          65536
>>>>> root   hard    nofile          65536
>>>>> root   -       memlock         unlimited
>>>>>
>>>>> *ES settings *
>>>>> config]# grep -v "^#" elasticsearch.yml | grep -v "^$"
>>>>>  bootstrap.mlockall: true
>>>>>
>>>>> *echo $ES_HEAP_SIZE*
>>>>> 18432m
>>>>>
>>>>> ---DUMP----
>>>>>
>>>>> # bin/elasticsearch
>>>>> [2014-05-04 13:30:12,653][INFO ][node                     ]
>>>>> [Sabretooth] version[1.1.1], pid[19309], build[f1585f0/2014-04-16T14:
>>>>> 27:12Z]
>>>>> [2014-05-04 13:30:12,653][INFO ][node                     ]
>>>>> [Sabretooth] initializing ...
>>>>> [2014-05-04 13:30:12,669][INFO ][plugins                  ]
>>>>> [Sabretooth] loaded [], sites []
>>>>> [2014-05-04 13:30:15,390][INFO ][node                     ]
>>>>> [Sabretooth] initialized
>>>>> [2014-05-04 13:30:15,390][INFO ][node                     ]
>>>>> [Sabretooth] starting ...
>>>>> [2014-05-04 13:30:15,531][INFO ][transport                ]
>>>>> [Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
>>>>> {inet[/10.109.136.59:9300]}
>>>>> [2014-05-04 13:30:18,553][INFO ][cluster.service          ]
>>>>> [Sabretooth] new_master [Sabretooth][eocFkTYMQnSTUar94A2vHw][ip-10-
>>>>> 109-136-59][inet[/10.109.136.59:9300]], reason: zen-disco-join
>>>>> (elected_as_master)
>>>>> [2014-05-04 13:30:18,579][INFO ][discovery                ]
>>>>> [Sabretooth] elasticsearch/eocFkTYMQnSTUar94A2vHw
>>>>> [2014-05-04 13:30:18,790][INFO ][http                     ]
>>>>> [Sabretooth] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
>>>>> {inet[/10.109.136.59:9200]}
>>>>> [2014-05-04 13:30:19,976][INFO ][gateway                  ]
>>>>> [Sabretooth] recovered [278] indices into cluster_state
>>>>> [2014-05-04 13:30:19,984][INFO ][node                     ]
>>>>> [Sabretooth] started
>>>>> OpenJDK 64-Bit Server VM warning: Attempt to protect stack guard pages
>>>>> failed.
>>>>> OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard
>>>>> pages failed.
>>>>> OpenJDK 64-Bit Server VM warning: INFO: 
>>>>> os::commit_memory(0x00000007f7c70000,
>>>>> 196608, 0) failed; error='Cannot allocate memory' (errno=12)
>>>>> #
>>>>> # There is insufficient memory for the Java Runtime Environment to
>>>>> continue.
>>>>> # Native memory allocation (malloc) failed to allocate 196608 bytes
>>>>> for committing reserved memory.
>>>>> # An error report file with more information is saved as:
>>>>> # /tmp/jvm-19309/hs_error.log
>>>>>
>>>>>
>>>>>
>>>>> ----
>>>>> *user untergeek on #logstash told me that I have reached a max number
>>>>> of indices on a single node. Here are my questions: *
>>>>>
>>>>>    1. Can I move half of my indexes to a new node ? If yes, how to do
>>>>>    that without compromising indexes
>>>>>    2. Logstash makes 1 index per day and I want to have 2 years of
>>>>>    data indexable ; Can I combine multiple indexes into one ? Like one 
>>>>> month
>>>>>    per month : this will mean I will not have more than 24 indexes.
>>>>>    3. How many nodes are ideal for 24 moths of data ~1.5G a day
>>>>>
>>>>>  --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>> topic/elasticsearch/cEimyMnhSv0/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%
>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/564e2951-ed54-4f34-97a9-4de88f187a7a%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/cEimyMnhSv0/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/5de77e8a-46dd-43c9-b4ad-557d117072ff%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHU4sP_02AfqaFOdZU6ZOmua32BuG4w2tv125Vyu2j7HAZy93w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: jdk fails with out of memory error / es critical index counts

Reply via email to