To further assist in this, here are our gateway settings -
gateway:
recover_after_nodes: 20
recover_after_time: 5m
expected_nodes: 20
Thanks,
Rohit
On Mon, Jun 23, 2014 at 1:02 PM, Rohit Jaiswal <[email protected]>
wrote:
> Hi Boaz,
> How can we fix this issue? (
> https://github.com/elasticsearch/elasticsearch/issues/4502)
>
> Will this work -
> 1. Take a backup of the data and local gateway
> directory of each ES node prior to node restart.
> 2. Disable routing allocation on each node.
> 3. Restart the node
> 4. Copy data and gateway from backup to node's data
> and gateway directory.
> 5. Enable routing allocation
> 6. Based on recovery settings, after
> gateway.recover_after_time seconds, index recovery will start from gateway.
>
> Thanks,
> Rohit
>
>
> On Sun, Jun 22, 2014 at 1:30 PM, Boaz Leskes <[email protected]> wrote:
>
>> Not that I know of. But there is a known but very rare bug (fixed in
>> 0.90.8) which can cause data loss upon a node restart:
>> https://github.com/elasticsearch/elasticsearch/issues/4502
>>
>> Maybe you run into that?
>>
>>
>> On Sun, Jun 22, 2014 at 10:18 PM, Rohit Jaiswal <[email protected]>
>> wrote:
>>
>>> Yes, it did when we restarted the node while trying to reproduce this
>>> problem. We also were able to access the data using the Scan search api
>>> after restarting the node.
>>>
>>> However we have seen quite a few of the bulk update errors in our
>>> 20-node production cluster and have suffered data loss on other aliases
>>> (The alias filter being the user-id) as well. We think the data loss is
>>> because of this bulk update error.
>>>
>>> Is there a chance of losing data on shards when enough of these bulk
>>> updates happen concurrently on multiple aliases (users)?
>>>
>>> Thanks
>>>
>>>
>>> On Sun, Jun 22, 2014 at 1:10 PM, Boaz Leskes <[email protected]> wrote:
>>>
>>>> If you restart the node it's on, it doesn't come back?
>>>>
>>>>
>>>> On Sun, Jun 22, 2014 at 10:01 PM, Rohit Jaiswal <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Boaz,
>>>>> Thanks for replying. After we get this error, the
>>>>> cluster health changes to Yellow with a replica shard in Unassigned state.
>>>>> Is there a specific way to recover that shard? We dont want to lose other
>>>>> data on that shard.
>>>>>
>>>>> Thanks,
>>>>> Rohit
>>>>>
>>>>>
>>>>> On Sun, Jun 22, 2014 at 12:50 PM, Boaz Leskes <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Rohit,
>>>>>>
>>>>>> This issue means update fails anyway, but it breaks the entire
>>>>>> request. You should indeed set the retry_on_conflict option to make the
>>>>>> update request succeed. PS - you should really upgrade - a lot has
>>>>>> happened
>>>>>> and was fixed since 0.90.2 ...
>>>>>>
>>>>>> Cheers,
>>>>>> Boaz
>>>>>>
>>>>>>
>>>>>> On Monday, June 16, 2014 10:26:06 PM UTC+2, Rohit Jaiswal wrote:
>>>>>>>
>>>>>>> Hi Boaz,
>>>>>>> We are using 0.90.2 and run into this issue. As i
>>>>>>> understand, one option is to upgrade to 0.90.3. If we continue using
>>>>>>> 0.90.2
>>>>>>> and use (increase) retry_on_conflict, we will not see the problem?
>>>>>>> Please
>>>>>>> clarify.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rohit
>>>>>>> On Wednesday, August 7, 2013 9:39:56 AM UTC-7, Boaz Leskes wrote:
>>>>>>>
>>>>>>>> HI Eric,
>>>>>>>>
>>>>>>>> OK. Based on the gist you sent, i tracked down a problem at fixed
>>>>>>>> it: https://github.com/elasticsearch/elasticsearch/issues/3448 .
>>>>>>>> Thanks!! The fix is part of 0.90.3, so I'd recommend upgrading. This
>>>>>>>> is a
>>>>>>>> secondary problem which occurs when two requests try to update the same
>>>>>>>> document at exactly the same time. One of them succeeds and the other
>>>>>>>> fails
>>>>>>>> with a version conflict (that error was masked by the error you were
>>>>>>>> seeing). You can use (or increase) the retry_on_conflict parameter to
>>>>>>>> make
>>>>>>>> the failing request try again.
>>>>>>>>
>>>>>>>> I'm still curious about your reporting of loosing replicas. Can you
>>>>>>>> elaborate more about what happens? Do you see anything in the logs?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Boaz
>>>>>>>>
>>>>>>>> On Tuesday, August 6, 2013 5:09:26 AM UTC+2, Eric Sites wrote:
>>>>>>>>>
>>>>>>>>> Boaz,
>>>>>>>>>
>>>>>>>>> Sorry but I no longer have those logs, I upgraded to 0.90.2 from
>>>>>>>>> 0.90.0 and wiped the logs when I did.
>>>>>>>>> I did the upgrade to use the _bulk api for my update.
>>>>>>>>>
>>>>>>>>> Basically the "lang", "js" was not the issue.
>>>>>>>>>
>>>>>>>>> I was using different scripts with the same set of params and an
>>>>>>>>> upcert. The fix was to use a different param name for different
>>>>>>>>> scripts,
>>>>>>>>> about 10 unique scripts in total.
>>>>>>>>>
>>>>>>>>> I was losing replicated shards about every 10,000 to 30,000
>>>>>>>>> updates, never the primary shard.
>>>>>>>>>
>>>>>>>>> I have 185 million + large json documents, with 100 shards in 1
>>>>>>>>> index with 1 replication, so 200 shards total over 6 servers. Each
>>>>>>>>> shard is
>>>>>>>>> about 10.4 GB in size.
>>>>>>>>> About 2 TB of data, 1 TB primary, 1 TB replicated.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Eric Sites
>>>>>>>>>
>>>>>>>>> From: Boaz Leskes <[email protected]>
>>>>>>>>> Reply-To: <[email protected]>
>>>>>>>>> Date: Monday, August 5, 2013 5:38 PM
>>>>>>>>> To: <[email protected]>
>>>>>>>>> Subject: Re: 0.90.2 _update or _bulk update causing
>>>>>>>>> NullPointerException in logs and I start losing shards
>>>>>>>>>
>>>>>>>>> Hi Eric,
>>>>>>>>>
>>>>>>>>> Glad to hear you solved it. It would be great if you can share the
>>>>>>>>> failed logs from the _update (non bulk call). A failed script
>>>>>>>>> shouldn't
>>>>>>>>> cause shards to drop so I would like to research it some more.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Boaz
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Aug 5, 2013 at 6:40 PM, Eric Sites <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Boaz,
>>>>>>>>>>
>>>>>>>>>> I found and fixed the problem.
>>>>>>>>>>
>>>>>>>>>> I added the "lang", "js" to the update json, that was not needed
>>>>>>>>>> before in es 0.90.0.
>>>>>>>>>> I also changed the name of new_tracking to match the name of the
>>>>>>>>>> action in the params section.
>>>>>>>>>> So for example the script now looks like this:
>>>>>>>>>>
>>>>>>>>>> if (ctx._source['tracking'] != null) {
>>>>>>>>>> if (ctx._source.tracking['some_action'] != null) {
>>>>>>>>>> ctx._source.tracking.some_action += param1;
>>>>>>>>>> } else {
>>>>>>>>>> ctx._source.tracking['some_action'] = 1;
>>>>>>>>>> }
>>>>>>>>>> } else {
>>>>>>>>>> ctx._source.tracking = new_some_action;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> "params" : { "param1" : 1, "new_some_action" : { "some_action" :
>>>>>>>>>> 1 } }
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Eric Sites
>>>>>>>>>>
>>>>>>>>>> From: Boaz Leskes <[email protected]>
>>>>>>>>>> Reply-To: <[email protected]>
>>>>>>>>>> Date: Monday, August 5, 2013 10:35 AM
>>>>>>>>>> To: <[email protected]>
>>>>>>>>>> Subject: Re: 0.90.2 _update or _bulk update causing
>>>>>>>>>> NullPointerException in logs and I start losing shards
>>>>>>>>>>
>>>>>>>>>> Hi Eric,
>>>>>>>>>>
>>>>>>>>>> This is interesting. The log stack trace from the gist comes from
>>>>>>>>>> the bulk calls. Can you also post one from a failed _update? Cross
>>>>>>>>>> checking
>>>>>>>>>> them might help pin pointing the issue.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Boaz
>>>>>>>>>>
>>>>>>>>>> On Monday, August 5, 2013 1:34:16 AM UTC+2, [email protected]
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I am getting java.lang.NullPointerException exception in my
>>>>>>>>>>> ElasticSearch cluster logs when I am doing a _bulk update or just an
>>>>>>>>>>> _update.
>>>>>>>>>>> I am sending a lot of data to my clusters. After I get this
>>>>>>>>>>> error I lose a shard and it has to be recreated.
>>>>>>>>>>>
>>>>>>>>>>> version 0.90.2
>>>>>>>>>>>
>>>>>>>>>>> gist: https://gist.github.com/EricSites/6152468
>>>>>>>>>>>
>>>>>>>>>>> I get this using the _bulk api or just normal _update api.
>>>>>>>>>>>
>>>>>>>>>>> My update script is a little complicated.
>>>>>>>>>>> I am adding a tracking object to my document if it does not
>>>>>>>>>>> exists. There should only be one of these and it should not be an
>>>>>>>>>>> array of
>>>>>>>>>>> these.
>>>>>>>>>>> If the object does exists, I am trying to add a new field to the
>>>>>>>>>>> tracking object to keep track on counts.
>>>>>>>>>>> So if the field does not exists I create it, else just += to it.
>>>>>>>>>>>
>>>>>>>>>>> if (ctx._source['tracking'] != null) {
>>>>>>>>>>> if (ctx._source.tracking['some_action'] != null) {
>>>>>>>>>>> ctx._source.tracking.some_action += param1;
>>>>>>>>>>> } else {
>>>>>>>>>>> ctx._source.tracking['some_action'] = 1;
>>>>>>>>>>> }
>>>>>>>>>>> } else {
>>>>>>>>>>> ctx._source.tracking = new_tracking;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here is my mapping for this:
>>>>>>>>>>> {
>>>>>>>>>>> "sample" : {
>>>>>>>>>>> "index_options" : "docs",
>>>>>>>>>>> "properties" : {
>>>>>>>>>>> "tracking" : {
>>>>>>>>>>> "type" : "object",
>>>>>>>>>>> "dynamic" : true
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>>> Google Groups "elasticsearch" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to [email protected].
>>>>>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to a topic
>>>>>>>>>> in the Google Groups "elasticsearch" group.
>>>>>>>>>> To unsubscribe from this topic, visit
>>>>>>>>>> https://groups.google.com/d/topic/elasticsearch/
>>>>>>>>>> yk7HvjqCgOg/unsubscribe.
>>>>>>>>>> To unsubscribe from this group and all its topics, send an email
>>>>>>>>>> to [email protected].
>>>>>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "elasticsearch" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>> You received this message because you are subscribed to a topic in
>>>>>> the Google Groups "elasticsearch" group.
>>>>>> To unsubscribe from this topic, visit
>>>>>> https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
>>>>>> .
>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>> [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/a22ffbaa-af7e-4d15-ac5a-e1dcd5b76976%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "elasticsearch" group.
>>>>> To unsubscribe from this topic, visit
>>>>> https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
>>>>> .
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GKheAXK%3Dq%2BG2vdyfgRBURuk4_udO8XFLNCTmDV3EnWiA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "elasticsearch" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe
>>>> .
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKzwz0oDUnYonpURtCVis-9UxS0FRiRMvLW1wZZybo2gOZboTA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/elasticsearch/CAP_rV8GexzoN8Nrf3GBaCrrXrVdKjUzzrkqw%3DYLwTW9YwEst5A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/yk7HvjqCgOg/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CAKzwz0qcZw2SR0Bt6GU06-FEp%2BL%2BRyAin3oCnWhpefGGVH99Zg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8F%2B5256w2oxgk7wW%2BRnNMS6WCQDkRipbBhoofEYFzZhdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.