Re: Shard copying performance

Michael Salmon Mon, 05 May 2014 04:56:25 -0700

This is the exception that I posted earlier:

[2014-04-28 13:40:15,039][WARN ][cluster.action.shard] [eis05] 
[ds_clearcase-vob-heat-analyzer][2] sending failed shard for 
[ds_clearcase-vob-heat-analyzer][2], node[QyeTlW2YQbG27zrsdjBBGA], [R], 
s[INITIALIZING], indexUUID [ms7jQeuMQduNIHCmjxsKjQ], reason [Failed to 
start shard, message 
[RecoveryFailedException[[ds_clearcase-vob-heat-analyzer][2]: Recovery 
failed from 
[eis09][p8-_fzHeTR22pSlsBsYm8A][eis09.rnditlab.ericsson.se][inet[/137.58.184.239:9300]]{datacenter=PoCC}
 
into [eis05][QyeTlW2YQbG27zrsdjBBGA][eis05.rnditlab.ericsson.se][inet[
eis05.rnditlab.ericsson.se/137.58.184.235:9300]]{datacenter=PoCC}<http://eis05.rnditlab.ericsson.se/137.58.184.235:9300%5D%5D%7Bdatacenter=PoCC%7D>];
 
nested: 
RemoteTransportException[[eis09][inet[/137.58.184.239:9300]][index/shard/recovery/startRecovery]];
 
nested: RecoveryEngineException[[ds_clearcase-vob-heat-analyzer][2] 
Phase[2] Execution failed]; nested: 
ReceiveTimeoutTransportException[[eis05][inet[/137.58.184.235:9300]][index/shard/recovery/prepareTranslog]
 
request_id [6809886] timed out after [900000ms]]; ]]
[2014-04-28 14:00:11,614][WARN ][indices.cluster] [eis05] 
[ds_clearcase-vob-heat-analyzer][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException: 
[ds_clearcase-vob-heat-analyzer][0]: Recovery failed from 
[eis07][Q8ZWgDIXRGiUej1oMoH8Jg][eis07.rnditlab.ericsson.se][inet[/137.58.184.237:9300]]{datacenter=PoCC}
 
into [eis05][QyeTlW2YQbG27zrsdjBBGA][eis05.rnditlab.ericsson.se][inet[
eis05.rnditlab.ericsson.se/137.58.184.235:9300]]{datacenter=PoCC}<http://eis05.rnditlab.ericsson.se/137.58.184.235:9300%5D%5D%7Bdatacenter=PoCC%7D>
at 
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:307)
at 
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:65)
at 
org.elasticsearch.indices.recovery.RecoveryTarget$3.run(RecoveryTarget.java:184)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.transport.RemoteTransportException: 
[eis07][inet[/137.58.184.237:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: 
[ds_clearcase-vob-heat-analyzer][0] Phase[2] Execution failed
at 
org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1098)
at 
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:627)
at 
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:117)
at 
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:61)
at 
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:337)
at 
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:323)
at 
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: 
[eis05][inet[/137.58.184.235:9300]][index/shard/recovery/prepareTranslog] 
request_id [154592652] timed out after [900000ms]
at 
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
... 3 more


I checked that the max_bytes_per_sec changed in the log, it accepts both MB 
and mb.

I am also changing my log level to trace but restarting the servers takes a 
long while.

On Monday, 5 May 2014 12:25:50 UTC+2, Alexander Reelsen wrote:
>
> Hey,
>
> you could change your default loglevel to find out, if those settings are 
> actually applied (either DEBUG or TRACE). Depending on the elasticsearch 
> version you are using, you might want to try with a lower-cased setting of 
> max_bytes_per_sec and set it to "250mb". Also, can you show the exception 
> which contains the "timeout in phase 2"?
>
>
> --Alex
>
>
> On Tue, Apr 29, 2014 at 3:50 PM, Michael Salmon 
> <[email protected]<javascript:>
> > wrote:
>
>> I am having trouble replicating a shard and I cannot see any possible 
>> reason for it. After 15 minutes I get a timeout in phase 2.
>>
>> The shard isn't that large about 60,000K, 5GB and 22 segments and the 
>> translog directories are empty.
>> The computers in question are lightly loaded as is the network between 
>> them.
>> Copying all the files in the shard from all 4 disks between the two 
>> computers with rsync takes about 40 seconds.
>> I can't run checkIndex on the source machine as it can't handle shards 
>> that are spread over multiple disks but it runs quite happily on the files 
>> I copied with rsync although it took a bit over 12 minutes to run the check.
>> I have ES 1.1.0 installed.
>> I changed some settings but none of them seem to make much difference:
>>
>>    "transient": {
>>       "logger": {
>>          "level": "TRACE"
>>       },
>>       "indices": {
>>          "store": {
>>             "throttle": {
>>                "type": "none"
>>             }
>>          },
>>          "recovery": {
>>             "translog_size": "256MB",
>>             "concurrent_streams": "16",
>>             "translog_ops": "10000",
>>             "max_bytes_per_sec": "250MB"
>>          }
>>       }
>>    }
>>
>> Does anyone have any tips on how I should proceed?
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a85c76cb-72d5-45c4-82cf-d8c8867a2151%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/a85c76cb-72d5-45c4-82cf-d8c8867a2151%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c04725d9-ef92-4c67-ac33-cb8fd96def06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Shard copying performance

Reply via email to