Lucene index corruption on nodes restart

Andrey Perminov Sat, 22 Mar 2014 06:05:16 -0700

We are using a small elasticsearch cluster of three nodes, version 1.0.1. 
Each node has 7 GB RAM. Our software creates daily indexes for storing it's 
data. Daily index is something around 5 GB. Unfortunately, for a reason, 
Elasticsearch eats up all RAM and hangs the node, even though heap size is 
set to 6 GB max. So we decided to use monit to restart it on reaching 
memory limit of 90%. It works, but sometimes we got such errors:


[2014-03-22 16:56:04,943][DEBUG][action.search.type       ] [es-00] 
[product-22-03-2014][0], node[jbUDVzuvS5GTM7iOG8iwzQ], [P], s[STARTED]: 
Failed to execute [org.elasticsearch.action.search.SearchRequest@687dc039]
org.elasticsearch.search.fetch.FetchPhaseExecutionException: 
[product-22-03-2014][0]: query[filtered(ToParentBlockJoinQuery 
(filtered(history.created:[1392574921000 TO 
*])->cache(_type:__history)))->cache(_type:product)],from[0],size[1000],sort[<custom:"history.created":
 
org.elasticsearch.index.search.nested.NestedFieldComparatorSource@15e4ece9>]: 
Fetch Failed [Failed to fetch doc id [7263214]]
        at 
org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:230)
        at 
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:156)
        at 
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:332)
        at 
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:304)
        at 
org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
        at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
        at 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4.run(TransportSearchTypeAction.java:292)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException: seek past EOF: 
MMapIndexInput(path="/opt/elasticsearch/main/nodes/0/indices/product-22-03-2014/0/index/_9lz.fdt")
        at 
org.apache.lucene.store.ByteBufferIndexInput.seek(ByteBufferIndexInput.java:174)
        at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:229)
        at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:276)
        at 
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
        at 
org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:196)
        at 
org.elasticsearch.search.fetch.FetchPhase.loadStoredFields(FetchPhase.java:228)
        ... 9 more
[2014-03-22 16:56:04,944][DEBUG][action.search.type       ] [es-00] All 
shards failed for phase: [query_fetch]

According to our logs, this might happen when one or two nodes get 
restarted. More strangely, same shard got corrupted on all nodes of 
cluster. Why could this happen? How can we fix it? Can you suggest us how 
to fix memory usage?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9e40c2a4-6a76-454d-a96b-483cdbf3e946%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lucene index corruption on nodes restart

Reply via email to