Hi, the full stack is like this (from outstanding tasks api). We are using
ES 1.4.1
insert_order : 69862
priority : HIGH
source : shard-failed ([agora_v1][24],
node[SEIBtFznTtGpLFPgCLgW4w], [R], s[INITIALIZING]),
reason [Failed to start shard, message
[CorruptIndexException[[agora_v1][24] Preexisting
corrupted index [corrupted_LrKHKRF7Q2KuL15TT_hPvw]
caused by: CorruptIndexException[Read past EOF while
reading segment infos]
EOFException[read past EOF:
MMapIndexInput(path="D:\
app\ES.ElasticSearch_v010\elasticsearch-1.4.1\data\AP-el
asticsearch\nodes\0\indices\agora_v1\24\index\segments_6
w")]
org.apache.lucene.index.CorruptIndexException: Read
past EOF while reading segment infos
at
org.elasticsearch.index.store.Store.readSegmentsI
nfo(Store.java:127)
at
org.elasticsearch.index.store.Store.access$400(St
ore.java:80)
at
org.elasticsearch.index.store.Store$MetadataSnaps
hot.buildMetadata(Store.java:575)
at
org.elasticsearch.index.store.Store$MetadataSnaps
hot.<init>(Store.java:568)
at
org.elasticsearch.index.store.Store.getMetadata(S
tore.java:186)
at
org.elasticsearch.index.store.Store.getMetadataOr
Empty(Store.java:150)
at
org.elasticsearch.indices.store.TransportNodesLis
tShardStoreMetaData.listStoreMetaData(TransportNodesList
ShardStoreMetaData.java:152)
at
org.elasticsearch.indices.store.TransportNodesLis
tShardStoreMetaData.nodeOperation(TransportNodesListShar
dStoreMetaData.java:138)
at
org.elasticsearch.indices.store.TransportNodesLis
tShardStoreMetaData.nodeOperation(TransportNodesListShar
dStoreMetaData.java:59)
at
org.elasticsearch.action.support.nodes.TransportN
odesOperationAction$NodeTransportHandler.messageReceived
(TransportNodesOperationAction.java:278)
at
org.elasticsearch.action.support.nodes.TransportN
odesOperationAction$NodeTransportHandler.messageReceived
(TransportNodesOperationAction.java:269)
at
org.elasticsearch.transport.netty.MessageChannelH
andler$RequestHandler.run(MessageChannelHandler.java:275
)
at
java.util.concurrent.ThreadPoolExecutor.runWorker
(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.ru
n(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException: read past EOF:
MMapInde
xInput(path="D:\app\ES.ElasticSearch_v010\elasticsearch-
1.4.1\data\AP-elasticsearch\nodes\0\indices\agora_v1\24\
index\segments_6w")
at
org.apache.lucene.store.ByteBufferIndexInput.read
Byte(ByteBufferIndexInput.java:81)
at
org.apache.lucene.store.BufferedChecksumIndexInpu
t.readByte(BufferedChecksumIndexInput.java:41)
at
org.apache.lucene.store.DataInput.readInt(DataInp
ut.java:98)
at
org.apache.lucene.index.SegmentInfos.read(Segment
Infos.java:343)
at
org.apache.lucene.index.SegmentInfos$1.doBody(Seg
mentInfos.java:454)
at
org.apache.lucene.index.SegmentInfos$FindSegments
File.run(SegmentInfos.java:906)
at
org.apache.lucene.index.SegmentInfos$FindSegments
File.run(SegmentInfos.java:752)
at
org.apache.lucene.index.SegmentInfos.read(Segment
Infos.java:450)
at
org.elasticsearch.common.lucene.Lucene.readSegmen
tInfos(Lucene.java:85)
at
org.elasticsearch.index.store.Store.readSegmentsI
nfo(Store.java:124)
... 14 more
]]]
executing : True
time_in_queue_millis : 52865
time_in_queue : 52.8s
insert_order : 69863
priority : HIGH
source : shard-failed ([agora_v1][24],
node[SEIBtFznTtGpLFPgCLgW4w], [R], s[INITIALIZING]),
reason [engine failure, message [corrupted
preexisting
index][CorruptIndexException[[agora_v1][24]
Preexisting
corrupted index [corrupted_LrKHKRF7Q2KuL15TT_hPvw]
caused by: CorruptIndexException[Read past EOF while
reading segment infos]
EOFException[read past EOF:
MMapIndexInput(path="D:\
app\ES.ElasticSearch_v010\elasticsearch-1.4.1\data\AP-el
asticsearch\nodes\0\indices\agora_v1\24\index\segments_6
w")]
org.apache.lucene.index.CorruptIndexException: Read
past EOF while reading segment infos
at
org.elasticsearch.index.store.Store.readSegmentsI
nfo(Store.java:127)
at
org.elasticsearch.index.store.Store.access$400(St
ore.java:80)
at
org.elasticsearch.index.store.Store$MetadataSnaps
hot.buildMetadata(Store.java:575)
at
org.elasticsearch.index.store.Store$MetadataSnaps
hot.<init>(Store.java:568)
at
org.elasticsearch.index.store.Store.getMetadata(S
tore.java:186)
at
org.elasticsearch.index.store.Store.getMetadataOr
Empty(Store.java:150)
at
org.elasticsearch.indices.store.TransportNodesLis
tShardStoreMetaData.listStoreMetaData(TransportNodesList
ShardStoreMetaData.java:152)
at
org.elasticsearch.indices.store.TransportNodesLis
tShardStoreMetaData.nodeOperation(TransportNodesListShar
dStoreMetaData.java:138)
at
org.elasticsearch.indices.store.TransportNodesLis
tShardStoreMetaData.nodeOperation(TransportNodesListShar
dStoreMetaData.java:59)
at
org.elasticsearch.action.support.nodes.TransportN
odesOperationAction$NodeTransportHandler.messageReceived
(TransportNodesOperationAction.java:278)
at
org.elasticsearch.action.support.nodes.TransportN
odesOperationAction$NodeTransportHandler.messageReceived
(TransportNodesOperationAction.java:269)
at
org.elasticsearch.transport.netty.MessageChannelH
andler$RequestHandler.run(MessageChannelHandler.java:275
)
at
java.util.concurrent.ThreadPoolExecutor.runWorker
(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.ru
n(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException: read past EOF:
MMapInde
xInput(path="D:\app\ES.ElasticSearch_v010\elasticsearch-
1.4.1\data\AP-elasticsearch\nodes\0\indices\agora_v1\24\
index\segments_6w")
at
org.apache.lucene.store.ByteBufferIndexInput.read
Byte(ByteBufferIndexInput.java:81)
at
org.apache.lucene.store.BufferedChecksumIndexInpu
t.readByte(BufferedChecksumIndexInput.java:41)
at
org.apache.lucene.store.DataInput.readInt(DataInp
ut.java:98)
at
org.apache.lucene.index.SegmentInfos.read(Segment
Infos.java:343)
at
org.apache.lucene.index.SegmentInfos$1.doBody(Seg
mentInfos.java:454)
at
org.apache.lucene.index.SegmentInfos$FindSegments
File.run(SegmentInfos.java:906)
at
org.apache.lucene.index.SegmentInfos$FindSegments
File.run(SegmentInfos.java:752)
at
org.apache.lucene.index.SegmentInfos.read(Segment
Infos.java:450)
at
org.elasticsearch.common.lucene.Lucene.readSegmen
tInfos(Lucene.java:85)
at
org.elasticsearch.index.store.Store.readSegmentsI
nfo(Store.java:124)
... 14 more
]]]
executing : False
time_in_queue_millis : 52862
time_in_queue : 52.8s
insert_order : 69865
priority : HIGH
source : shard-failed ([kibana-int][88],
node[adjp-WHHSP6kWEiPd3HkeQ], [R], s[INITIALIZING]),
reason [Failed to start shard, message
[RecoveryFailedException[[kibana-int][88]: Recovery
failed from
[Quasimodo][spfLOfnjTeiGwrYPMIiRjg][CH1SCH06
0021734][inet[/10.46.208.169:9300]] into
[Hyperion][adjp
-WHHSP6kWEiPd3HkeQ][CH1SCH050051642][inet[/10.46.216.169
:9300]]]; nested:
RemoteTransportException[[Quasimodo][i
net[/10.46.208.169:9300]][internal:index/shard/recovery/
start_recovery]]; nested:
RecoveryEngineException[[kibana-int][88] Phase[1]
Execution failed]; nested:
RecoverFilesRecoveryException[[kibana-int][88]
Failed
to transfer [0] files with total size of [0b]];
nested:
NoSuchFileException[D:\app\ES.ElasticSearch_v010\elastic
search-1.4.1\data\AP-elasticsearch\nodes\0\indices\kiban
a-int\88\index\segments_2]; ]]
executing : False
time_in_queue_millis : 52860
time_in_queue : 52.8s
On Friday, January 9, 2015 at 5:50:44 PM UTC+5:30, Robert Muir wrote:
>
> Why did you snip the stack trace? can you provide all the information?
>
> On Thu, Jan 8, 2015 at 10:37 PM, Darshat <[email protected] <javascript:>>
> wrote:
> > Hi,
> > We have a 98 node cluster of ES with each node 32GB RAM. 16GB is
> reserved
> > for ES via config file. The index has 98 shards with 2 replicas.
> >
> > On this cluster we are loading a large number of documents (when done it
> > would be about 10 billion). In this use case about 40million documents
> are
> > generated per hour and we are pre-loading several days worth of
> documents to
> > prototype how ES will scale, and its query performance.
> >
> > Right now we are facing problems getting data loaded. Indexing is turned
> > off. We use NEST client, with batch size of 10k. To speed up data load,
> we
> > distribute the hourly data to each of the 98 nodes to insert in
> parallel.
> > This worked ok for a few hours till we got 4.5B documents in the
> cluster.
> >
> > After that the cluster state went to red. The outstanding tasks CAT API
> > shows errors like below. CPU/Disk/Memory seems to be fine on the nodes.
> >
> > Why are we getting these errors?. any help greatly appreciated since
> this
> > blocks prototyping ES for our use case.
> >
> > thanks
> > Darshat
> >
> > Sample errors:
> >
> > source : shard-failed ([agora_v1][24],
> > node[00ihc1ToRiqMDJ1lou1Sig], [R],
> s[INITIALIZING]),
> > reason [Failed to start shard, message
> > [RecoveryFailedException[[agora_v1][24]: Recovery
> > failed from [Shingen
> > Harada][RDAwqX9yRgud9f7YtZAJPg][CH1
> > SCH060051438][inet[/10.46.153.84:9300]] into
> > [Elfqueen][
> >
> > 00ihc1ToRiqMDJ1lou1Sig][CH1SCH050053435][inet[/10.46.182
> > .106:9300]]]; nested:
> > RemoteTransportException[[Shingen
> >
> > Harada][inet[/10.46.153.84:9300]][internal:index/shard/r
> > ecovery/start_recovery]]; nested:
> > RecoveryEngineException[[agora_v1][24] Phase[1]
> > Execution failed]; nested:
> > RecoverFilesRecoveryException[[agora_v1][24]
> Failed
> > to
> > transfer [0] files with total size of [0b]];
> nested:
> > NoS
> >
> > uchFileException[D:\app\ES.ElasticSearch_v010\elasticsea
> >
> > rch-1.4.1\data\AP-elasticsearch\nodes\0\indices\agora_v1
> > \24\index\segments_6r]; ]]
> >
> >
> > AND
> >
> > source : shard-failed ([agora_v1][95],
> > node[PUsHFCStRaecPA6MuvJV9g], [P],
> s[INITIALIZING]),
> > reason [Failed to start shard, message
> >
> [IndexShardGatewayRecoveryException[[agora_v1][95]
> > failed to fetch index version after copying it
> over];
> > nested: CorruptIndexException[[agora_v1][95]
> > Preexisting corrupted index
> > [corrupted_1wegvS7BSKSbOYQkX9zJSw] caused by:
> > CorruptIndexException[Read past EOF while reading
> > segment infos]
> > EOFException[read past EOF:
> > MMapIndexInput(path="D:\
> >
> > app\ES.ElasticSearch_v010\elasticsearch-1.4.1\data\AP-el
> >
> > asticsearch\nodes\0\indices\agora_v1\95\index\segments_1
> > 1j")]
> > org.apache.lucene.index.CorruptIndexException:
> Read
> > past EOF while reading segment infos
> > at
> > org.elasticsearch.index.store.Store.readSegmentsI
> > nfo(Store.java:127)
> > at
> > org.elasticsearch.index.store.Store.access$400(St
> > ore.java:80)
> > at
> > org.elasticsearch.index.store.Store$MetadataSnaps
> > hot.buildMetadata(Store.java:575)
> > ---snip more stack trace-----
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://elasticsearch-users.115913.n3.nabble.com/Index-corruption-when-upload-large-number-of-documents-4billion-tp4068742.html
>
> > Sent from the ElasticSearch Users mailing list archive at Nabble.com.
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "elasticsearch" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected] <javascript:>.
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1420774624607-4068742.post%40n3.nabble.com.
>
>
> > For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/301a744c-6dfa-44f7-95ca-1ca007634d37%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.