Why did you snip the stack trace? can you provide all the information?

On Thu, Jan 8, 2015 at 10:37 PM, Darshat <[email protected]> wrote:
> Hi,
> We have a 98 node cluster of ES with each node 32GB RAM. 16GB is reserved
> for ES via config file. The index has 98 shards with 2 replicas.
>
> On this cluster we are loading a large number of documents (when done it
> would be about 10 billion). In this use case about 40million documents are
> generated per hour and we are pre-loading several days worth of documents to
> prototype how ES will scale, and its query performance.
>
> Right now we are facing problems getting data loaded. Indexing is turned
> off. We use NEST client, with batch size of 10k. To speed up data load, we
> distribute the hourly data to each of the 98 nodes to insert in parallel.
> This worked ok for a few hours till we got 4.5B documents in the cluster.
>
> After that the cluster state went to red. The outstanding tasks CAT API
> shows errors like below. CPU/Disk/Memory seems to be fine on the nodes.
>
> Why are we getting these errors?. any help greatly appreciated since this
> blocks prototyping ES for our use case.
>
> thanks
> Darshat
>
> Sample errors:
>
> source               : shard-failed ([agora_v1][24],
>                        node[00ihc1ToRiqMDJ1lou1Sig], [R], s[INITIALIZING]),
>                        reason [Failed to start shard, message
>                        [RecoveryFailedException[[agora_v1][24]: Recovery
>                        failed from [Shingen
> Harada][RDAwqX9yRgud9f7YtZAJPg][CH1
>                        SCH060051438][inet[/10.46.153.84:9300]] into
> [Elfqueen][
>
> 00ihc1ToRiqMDJ1lou1Sig][CH1SCH050053435][inet[/10.46.182
>                        .106:9300]]]; nested:
> RemoteTransportException[[Shingen
>
> Harada][inet[/10.46.153.84:9300]][internal:index/shard/r
>                        ecovery/start_recovery]]; nested:
>                        RecoveryEngineException[[agora_v1][24] Phase[1]
>                        Execution failed]; nested:
>                        RecoverFilesRecoveryException[[agora_v1][24] Failed
> to
>                        transfer [0] files with total size of [0b]]; nested:
> NoS
>
> uchFileException[D:\app\ES.ElasticSearch_v010\elasticsea
>
> rch-1.4.1\data\AP-elasticsearch\nodes\0\indices\agora_v1
>                        \24\index\segments_6r]; ]]
>
>
> AND
>
> source               : shard-failed ([agora_v1][95],
>                        node[PUsHFCStRaecPA6MuvJV9g], [P], s[INITIALIZING]),
>                        reason [Failed to start shard, message
>                        [IndexShardGatewayRecoveryException[[agora_v1][95]
>                        failed to fetch index version after copying it over];
>                        nested: CorruptIndexException[[agora_v1][95]
>                        Preexisting corrupted index
>                        [corrupted_1wegvS7BSKSbOYQkX9zJSw] caused by:
>                        CorruptIndexException[Read past EOF while reading
>                        segment infos]
>                            EOFException[read past EOF:
> MMapIndexInput(path="D:\
>
> app\ES.ElasticSearch_v010\elasticsearch-1.4.1\data\AP-el
>
> asticsearch\nodes\0\indices\agora_v1\95\index\segments_1
>                        1j")]
>                        org.apache.lucene.index.CorruptIndexException: Read
>                        past EOF while reading segment infos
>                            at
> org.elasticsearch.index.store.Store.readSegmentsI
>                        nfo(Store.java:127)
>                            at
> org.elasticsearch.index.store.Store.access$400(St
>                        ore.java:80)
>                            at
> org.elasticsearch.index.store.Store$MetadataSnaps
>                        hot.buildMetadata(Store.java:575)
> ---snip more stack trace-----
>
>
>
>
>
>
>
>
> --
> View this message in context: 
> http://elasticsearch-users.115913.n3.nabble.com/Index-corruption-when-upload-large-number-of-documents-4billion-tp4068742.html
> Sent from the ElasticSearch Users mailing list archive at Nabble.com.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/1420774624607-4068742.post%40n3.nabble.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMUKNZULi4OYDGH5_4FOtkxBFhHUpnt4GCvAiBNHHWjp3a-ouw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to