Re: Index recovery failure on node restart since v1.3.x

Ankush Jhalani Wed, 08 Oct 2014 08:08:48 -0700

Thanks. It's difficult to replicate w/o the data but I will try to ask on 
github.


On Wednesday, October 8, 2014 6:04:52 AM UTC-4, Thibaut wrote:
>
> Hi,
>
> I would open up an issue on github. Even if it's just one node, 
> elasticsearch should restart.
>
> Thanks,
> Thibaut
>
> On Tue, Oct 7, 2014 at 11:03 PM, Ankush Jhalani <[email protected] 
> <javascript:>> wrote:
>
>> Well it's a shared resource (not prod), used for other stuff and due to 
>> historical/enterprise reasons it's bounced every week. Though not ideal, I 
>> expect ES to be able to restart without issues. 
>>
>> On Tuesday, October 7, 2014 5:01:15 PM UTC-4, Mark Walkom wrote:
>>>
>>> Why are you restarting the node every week?
>>> That sounds like a problem you should solve to stop this one happening.
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: [email protected]
>>> web: www.campaignmonitor.com
>>>
>>> On 8 October 2014 07:56, Ankush Jhalani <[email protected]> wrote:
>>>
>>>> We have a single node ES instance, which is restarted once a week. 
>>>> Every time it's restarted, one specific index recovery is always stuck at 
>>>> - 
>>>>
>>>>> [2014-10-06 22:47:48,107][DEBUG][index.translog           ] 
>>>>> [testnode] [testindex_20140930][0] interval [5s], flush_threshold_ops 
>>>>> [2147483647], flush_threshold_size [200mb], flush_threshold_period 
>>>>> [30m]
>>>>> [2014-10-06 22:47:48,108][DEBUG][index.shard.service      ] 
>>>>> [testnode] [testindex_20140930][0] state: [CREATED]->[RECOVERING], reason 
>>>>> [from gateway]
>>>>> [2014-10-06 22:47:48,108][DEBUG][index.gateway            ] 
>>>>> [testnode] [testindex_20140930][0] starting recovery from local ...
>>>>> [2014-10-06 22:47:48,203][DEBUG][index.engine.internal    ] 
>>>>> [testnode] [testindex_20140930][0] starting engine
>>>>>
>>>>>
>>>>>
>>>>>  We have to delete that index for recovery to complete. Doing hot 
>>>> threads dump, we get following logs - 
>>>> ::: 
>>>> [testnode.node][ff9m9KnRSqWfkrTZiAMbsA][testnode][inet[/10.126.143.197:9301]]{datacenter=nj,
>>>>  
>>>> master=true}
>>>>    
>>>>    102.9% (514.3ms out of 500ms) cpu usage by thread 
>>>> 'elasticsearch[testnode.node][generic][T#2]'
>>>>      10/10 snapshots sharing following 14 elements
>>>>        org.elasticsearch.index.engine.internal.
>>>> InternalEngine$SearchFactory.newSearcher(InternalEngine.java:1574)
>>>>        org.apache.lucene.search.SearcherManager.getSearcher(
>>>> SearcherManager.java:160)
>>>>        org.apache.lucene.search.SearcherManager.refreshIfNeeded(
>>>> SearcherManager.java:122)
>>>>        org.apache.lucene.search.SearcherManager.refreshIfNeeded(
>>>> SearcherManager.java:58)
>>>>        org.apache.lucene.search.ReferenceManager.doMaybeRefresh(
>>>> ReferenceManager.java:176)
>>>>        org.apache.lucene.search.ReferenceManager.maybeRefresh(
>>>> ReferenceManager.java:225)
>>>>        org.elasticsearch.index.engine.internal.InternalEngine.refresh(
>>>> InternalEngine.java:779)
>>>>        org.elasticsearch.index.engine.internal.InternalEngine.delete(
>>>> InternalEngine.java:686)
>>>>        org.elasticsearch.index.shard.service.InternalIndexShard.
>>>> performRecoveryOperation(InternalIndexShard.java:780)
>>>>        org.elasticsearch.index.gateway.local.LocalIndexShardGateway.
>>>> recover(LocalIndexShardGateway.java:250)
>>>>        org.elasticsearch.index.gateway.IndexShardGatewayService$1.
>>>> run(IndexShardGatewayService.java:132)
>>>>        java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.java:1110)
>>>>        java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.java:603)
>>>>        java.lang.Thread.run(Thread.java:722)
>>>>    
>>>>
>>>>
>>>> We started seeing this error with upgrade to v1.3.2, and still 
>>>> happening with v1.3.4. Could someone advice what could be happening? 
>>>> Thanks.
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/584e7b07-0957-49ca-b67a-3f8dc281312a%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/584e7b07-0957-49ca-b67a-3f8dc281312a%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/327c2b19-109a-4f42-9031-93a2c8c275e9%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/327c2b19-109a-4f42-9031-93a2c8c275e9%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e68dff51-c65f-4149-b693-048011326a73%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Index recovery failure on node restart since v1.3.x

Reply via email to