corruption when indexing large number of documents (4 billion+)

Darshat Shah Thu, 08 Jan 2015 19:45:47 -0800

Hi, 
We have a 98 node cluster of ES with each node 32GB RAM. 16GB is reserved 
for ES via config file. The index has 98 shards with 2 replicas.


On this cluster we are loading a large number of documents (when done it 
would be about 10 billion). About 40million documents are generated per 
hour and we are pre-loading several days worth of documents to prototype 
how ES will scale, and its query performance. 

Right now we are facing problems getting data pre-loaded. Indexing is 
turned off. We use NEST client, with batch size of 10k. To speed up data 
load, we distribute the hourly data to each of the 98 nodes to insert in 
parallel. This worked ok for a few hours till we got 4.5B documents in the 
cluster. 

After that the cluster state went to red. The outstanding tasks CAT API 
shows errors like below. CPU/Disk/Memory seems to be fine on the nodes. 

Why are we getting these errors and is there a best practice? any help 
greatly appreciated since this blocks prototyping ES for our use case. 

thanks 
Darshat 

Sample errors: 

source               : shard-failed ([agora_v1][24], 
                       node[00ihc1ToRiqMDJ1lou1Sig], [R], s[INITIALIZING]), 
                       reason [Failed to start shard, message 
                       [RecoveryFailedException[[agora_v1][24]: Recovery 
                       failed from [Shingen 
Harada][RDAwqX9yRgud9f7YtZAJPg][CH1 
                       SCH060051438][inet[/10.46.153.84:9300]] into 
[Elfqueen][ 
                      
 00ihc1ToRiqMDJ1lou1Sig][CH1SCH050053435][inet[/10.46.182 
                       .106:9300]]]; nested: 
RemoteTransportException[[Shingen 
                      
 Harada][inet[/10.46.153.84:9300]][internal:index/shard/r 
                       ecovery/start_recovery]]; nested: 
                       RecoveryEngineException[[agora_v1][24] Phase[1] 
                       Execution failed]; nested: 
                       RecoverFilesRecoveryException[[agora_v1][24] Failed 
to 
                       transfer [0] files with total size of [0b]]; nested: 
NoS 
                      
 uchFileException[D:\app\ES.ElasticSearch_v010\elasticsea 
                      
 rch-1.4.1\data\AP-elasticsearch\nodes\0\indices\agora_v1 
                       \24\index\segments_6r]; ]] 


AND 

source               : shard-failed ([agora_v1][95], 
                       node[PUsHFCStRaecPA6MuvJV9g], [P], s[INITIALIZING]), 
                       reason [Failed to start shard, message 
                       [IndexShardGatewayRecoveryException[[agora_v1][95] 
                       failed to fetch index version after copying it 
over]; 
                       nested: CorruptIndexException[[agora_v1][95] 
                       Preexisting corrupted index 
                       [corrupted_1wegvS7BSKSbOYQkX9zJSw] caused by: 
                       CorruptIndexException[Read past EOF while reading 
                       segment infos] 
                           EOFException[read past EOF: 
MMapIndexInput(path="D:\ 
                      
 app\ES.ElasticSearch_v010\elasticsearch-1.4.1\data\AP-el 
                      
 asticsearch\nodes\0\indices\agora_v1\95\index\segments_1 
                       1j")] 
                       org.apache.lucene.index.CorruptIndexException: Read 
                       past EOF while reading segment infos 
                           at 
org.elasticsearch.index.store.Store.readSegmentsI 
                       nfo(Store.java:127) 
                           at 
org.elasticsearch.index.store.Store.access$400(St 
                       ore.java:80) 
                           at 
org.elasticsearch.index.store.Store$MetadataSnaps 
                       hot.buildMetadata(Store.java:575) 
---snip more stack trace-----  


  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0f24b939-2cba-41a9-8de8-49565f77e567%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

corruption when indexing large number of documents (4 billion+)

Reply via email to