Re: Losing data after Elasticsearch restart

Alexander Reelsen Fri, 20 Jun 2014 00:18:25 -0700

Hey,

the exception you showed, can possibly happen, when you remove an alias.
However you mentioned NullPointerException in your first post, which is not
contained in the stacktrace, so it seems, that one is still missing.


Also, please retry with a newer version of Elasticsearch.


--Alex


On Thu, Jun 19, 2014 at 5:13 AM, Rohit Jaiswal <[email protected]>
wrote:

> Hi Alexander,
>                We sent you the stack trace. Can you please enlighten us on
> this?
>
> Thanks,
> Rohit
>
>
> On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal <[email protected]>
> wrote:
>
>> Hi Alexander,
>>                         Thanks for your reply. We plan to upgrade in the
>> long run, however we need to fix the data loss problem on 0.90.2 in the
>> immediate term.
>>
>> Here is the stack trace -
>>
>>
>> 10:09:37.783 PM
>>
>> [22:09:37,783][WARN ][indices.cluster          ] [Storm]
>> [b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
>> org.elasticsearch.indices.recovery.RecoveryFailedException:
>> [b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
>> Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
>> [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
>>     at
>> org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
>>     at
>> org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
>>     at
>> org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
>> Source)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>     at java.lang.Thread.run(Unknown Source)
>> Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey
>> Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]
>> Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
>> [b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
>>     at
>> org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
>>     at
>> org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
>>     at
>> org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
>>     at
>> org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
>>     at
>> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
>>     at
>> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
>>     at
>> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
>> Source)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>     at java.lang.Thread.run(Unknown Source)
>> Caused by: org.elasticsearch.transport.RemoteTransportException:
>> [Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
>> Caused by: org.elasticsearch.indices.InvalidAliasNameException:
>> [b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
>> [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
>> alias name was passed to alias Filter
>>     at
>> org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
>>     at
>> org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
>>     at
>> org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
>>     at
>> org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
>>     at
>> org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
>>     at
>> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
>> Source)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>     at java.lang.Thread.run(Unknown Source)
>> [22:09:37,799][WARN ][cluster.action.shard     ] [Storm] sending failed
>> shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
>> node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
>> shard, message
>> [RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
>> failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]]
>> into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
>> RemoteTransportException[[Jeffrey 
>> Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
>> nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
>> Phase[2] Execution failed]; nested:
>> RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
>> nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
>> Invalid alias name
>> [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
>> alias name was passed to alias Filter]; ]]
>> [22:09:38,025][WARN ][indices.cluster          ] [Storm]
>> [b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
>> org.elasticsearch.indices.recovery.RecoveryFailedException:
>> [b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
>> Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
>> [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
>>     at
>> org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
>>     at
>> org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
>>     at
>> org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
>> Source)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>     at java.lang.Thread.run(Unknown Source)
>> Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey
>> Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]
>> Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
>> [b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
>>     at
>> org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
>>     at
>> org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
>>     at
>> org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
>>     at
>> org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
>>     at
>> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
>>     at
>> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
>>     at
>> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
>> Source)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>     at java.lang.Thread.run(Unknown Source)
>> Caused by: org.elasticsearch.transport.RemoteTransportException:
>> [Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
>> Caused by: org.elasticsearch.indices.InvalidAliasNameException:
>> [b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
>> [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
>> alias name was passed to alias Filter
>>     at
>> org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
>>     at
>> org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
>>     at
>> org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
>>     at
>> org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
>>     at
>> org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
>>     at
>> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
>> Source)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>     at java.lang.Thread.run(Unknown Source)
>>
>> [22:09:38,042][WARN ][cluster.action.shard     ] [Storm] sending failed
>> shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
>> node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
>> shard, message
>> [RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
>> failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]]
>> into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
>> RemoteTransportException[[Jeffrey 
>> Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
>> nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
>> Phase[2] Execution failed]; nested:
>> RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
>> nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
>> Invalid alias name
>> [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
>> alias name was passed to alias Filter]; ]]
>>
>>
>> Let us know..
>>
>> Thanks,
>> Rohit
>>
>>
>> On Mon, Jun 16, 2014 at 6:13 AM, Alexander Reelsen <[email protected]>
>> wrote:
>>
>>> Hey,
>>>
>>> without stack traces it is pretty hard to see the actual problem, do you
>>> have them around (on one node this exception has happened, so it should
>>> have been logged into the elasticsearch logfile as well). Also, you should
>>> really upgrade if possible, as releases after 0.90.2 have seen many many
>>> improvements.
>>>
>>>
>>> --Alex
>>>
>>>
>>> On Mon, Jun 9, 2014 at 4:15 AM, Rohit Jaiswal <[email protected]>
>>> wrote:
>>>
>>>> Hello Everyone,
>>>>                          We lost data after restarting Elasticsearch
>>>> cluster. Restarting is a part of deploying our software stack.
>>>>
>>>>                          We have a 20-node cluster running 0.90.2 and
>>>> we have Splunk configured to index ES logs.
>>>>
>>>>                          Looking at the Splunk logs, we could find the
>>>> following *error a day before the deployment* (restart) -
>>>>
>>>>                 [cluster.action.shard     ] [Rictor] sending failed shard 
>>>> for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], 
>>>> [R], s[STARTED], reason
>>>>
>>>>
>>>>                 [Failed to perform [bulk/shard] on replica, message 
>>>> [RemoteTransportException; nested: 
>>>> ResponseHandlerFailureTransportException; nested: NullPointerException; ]]
>>>>
>>>>
>>>>
>>>>                 [cluster.action.shard     ] [Kiss] received shard failed 
>>>> for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], 
>>>> [R], s[STARTED], reason
>>>>
>>>>
>>>>
>>>>                 [Failed to perform [bulk/shard] on replica, message 
>>>> [RemoteTransportException; nested: 
>>>> ResponseHandlerFailureTransportException; nested: NullPointerException; ]]
>>>>
>>>>
>>>>
>>>>                           Further,* a day after the deploy,* we see
>>>> the same errors on another node -
>>>>
>>>>
>>>>
>>>>                 [cluster.action.shard     ] [Contrary] received shard 
>>>> failed for [a58f9413315048ecb0abea48f5f6aae7][1], 
>>>> node[3UbHwVCkQvO3XroIl-awPw], [R], s[STARTED], reason
>>>>
>>>>
>>>>
>>>>                 [Failed to perform [bulk/shard] on replica, message 
>>>> [RemoteTransportException; nested: 
>>>> ResponseHandlerFailureTransportException; nested: NullPointerException; ]]
>>>>
>>>>
>>>>
>>>>
>>>>              *Immediately next, the following error is seen*. This error 
>>>> is seen repeatedly on a couple of other nodes as well -
>>>>
>>>>                  failed to start shard
>>>>
>>>>
>>>>
>>>>
>>>>                  [cluster.action.shard     ] [Copperhead] sending failed 
>>>> shard for [a58f9413315048ecb0abea48f5f6aae7][0], 
>>>> node[EuRzr3MLQiSS6lzTZJbiKw], [R], s[INITIALIZING],
>>>>                  reason [Failed to start shard, message 
>>>> [RecoveryFailedException[[a58f9413315048ecb0abea48f5f6aae7][0]: Recovery 
>>>> failed from [Frank Castle][dlv2mPypQaOxLPQhHQ67Fw]
>>>>
>>>>
>>>>
>>>>                  [inet[/10.2.136.81:9300]] into 
>>>> [Copperhead][EuRzr3MLQiSS6lzTZJbiKw][inet[/10.3.207.55:9300]]]; nested: 
>>>> RemoteTransportException[[Frank Castle]
>>>>                  
>>>> [inet[/10.2.136.81:9300]][index/shard/recovery/startRecovery]]; nested: 
>>>> RecoveryEngineException[[a58f9413315048ecb0abea48f5f6aae7][0] Phase[2] 
>>>> Execution failed];
>>>>
>>>>
>>>>
>>>>                  nested: 
>>>> RemoteTransportException[[Copperhead][inet[/10.3.207.55:9300]][index/shard/recovery/translogOps]];
>>>>  nested: InvalidAliasNameException[[a58f9413315048ecb0abea48f5f6aae7]
>>>>
>>>> *         Invalid alias name 
>>>> [fbf1e55418a2327d308e7632911f9bb8bfed58059dd7f1e4abd3467c5f8519c3], 
>>>> Unknown alias name was passed to alias Filter]; ]]
>>>>
>>>>
>>>> *
>>>>
>>>>
>>>> *During this time, we could not access previously indexed documents.*
>>>>              I looked up the alias error, looks like it is related to 
>>>> https://github.com/elasticsearch/elasticsearch/issues/1198 (Delete By 
>>>> Query wrongly persisted to translog # 1198),
>>>>
>>>>
>>>>
>>>>              but this should be fixed in ES 0.18.0 and, we are using 
>>>> 0.90.2, so why is ES encountering this issue?
>>>>
>>>>              What do we need to do to set this right and get back lost 
>>>> data? Please help.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "elasticsearch" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/elasticsearch/2wUHvnd_lU4/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-FS8RtP3AfR-cE3Ok33eDK6PtbEKyiPhSXOVLg00xKZQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Losing data after Elasticsearch restart

Reply via email to