Re: Help understanding "UnavailableShardsException" error

Boaz Leskes Wed, 19 Mar 2014 04:59:31 -0700

HI Nikhil,

The wrong ip in the unicast is unrelated to the error. The unicast list is 
only used when a node first startup and need to join the cluster. Once 
that's done the list is not used anymore. It does mean you potentially 
could have had the first master node (whose IP was wrong in the list) 
ellect itself as master and no one would have connected to it (if it was 
the first to start).


As to the error - I'm not sure exactly what's caused the shards not be 
ready in time. Is there anything you see in the ES logs? Also, does the 
indexing succeed once retried? Can it be that this is always the first 
message in it's respecting index?

Cheers,
Boaz

On Wednesday, March 12, 2014 6:31:28 PM UTC+1, Nikhil Singh wrote:
>
> I forgot to mention the ES version that we are running. It is 0.90.3
>
> On Wednesday, March 12, 2014 10:54:46 PM UTC+5:30, Nikhil Singh wrote:
>>
>> Hi all,
>>
>> We have a 5 node elasticsearch setup as follows:
>>
>>    - 3 nodes running in data mode
>>    - 2 nodes running in master only mode
>>    
>> Each of the nodes have the following configuration options set in 
>> elasticsearch.yml file
>>
>>> index.number_of_shards: 6
>>> index.number_of_replicas: 0
>>> index.mapper.dynamic: true
>>> action.auto_create_index: true
>>> action.disable_delete_all_indices: true
>>
>>
>> What I understand from the above config is that each index will have 6 
>> shards and there will not be any replica of each of the shards. Also, we 
>> have the following discovery options set
>>
>>> discovery.zen.minimum_master_nodes: 1
>>> discovery.zen.ping.multicast.enabled: false
>>> discovery.zen.ping.unicast.hosts: ["10.101.55.93[9300-9400]", 
>>> "10.242.22.126[9300-9400]", "10.144.77.94[9300-9400]", 
>>> "10.116.173.148[9300-9400]", "10.224.42.205[9300-9400]"]
>>
>>
>> We are also running logstash which is indexing documents in this 
>> elasticsearch setup. Yesterday, I started noticing the following errors in 
>> the logstash indexer log file
>>
>>> {:timestamp=>"2014-03-10T11:05:44.453000+0000", :message=>"Failed to 
>>> index an event, will retry", :exception=>
>>> *org.elasticsearch.action.UnavailableShardsException*: 
>>> [logstash-2014.03.10][1] [1] shardIt, [0] active : Timeout waiting for 
>>> [1m], request: index 
>>> {[logstash-2014.03.10][application][uMrD19E4QuKepoBq17ZKQA], 
>>> source[{"@source":"file://ip-10-118-115-235//mnt/deploy/apache-tomcat/logs/application.log","@tags":["shipped"],"@fields":{"environment":["production"],"service":["application"],"machine":["proxy1"],"timestamp":"03/10/14
>>>  
>>> 06:03:07.292","thread":"http-8080-126","severity":"DEBUG","message":"ources.AccountServicesResource
>>>  
>>> idx=0"},"@timestamp":"2014-03-10T06:03:07.292Z","@source_host":"ip-10-118-115-235","@source_path":"//mnt/deploy/apache-tomcat/logs/application.log","@message":"03/10/14
>>>  
>>> 06:03:07.292 [http-8080-126] DEBUG ources.AccountServicesResource 
>>> idx=0","@type":"application"}]}, 
>>> :event=>{"@source"=>"file://ip-10-118-115-235//mnt/deploy/apache-tomcat/logs/application.log",
>>>  
>>> "@tags"=>["shipped"], "@fields"=>{"environment"=>["production"], 
>>> "service"=>["application"], "machine"=>["proxy1"], "timestamp"=>"03/10/14 
>>> 06:03:07.292", "thread"=>"http-8080-126", "severity"=>"DEBUG", 
>>> "message"=>"ources.AccountServicesResource idx=0"}, 
>>> "@timestamp"=>"2014-03-10T06:03:07.292Z", 
>>> "@source_host"=>"ip-10-118-115-235", 
>>> "@source_path"=>"//mnt/deploy/apache-tomcat/logs/application.log", 
>>> "@message"=>"03/10/14 06:03:07.292 [http-8080-126] DEBUG 
>>> ources.AccountServicesResource idx=0", "@type"=>"application"}, 
>>> :level=>:warn} 
>>
>>
>> While I was debugging this issue, I noticed that the first IP in the 
>> *discovery.zen.ping.unicast.hosts 
>> *was wrong. That IP did not point to any of the 5 nodes in the cluster. 
>> I realized that the first IP should be of one of the nodes configured as 
>> master and I changed that IP to the correct one on all of the nodes and 
>> restarted ES. After that change, I no longer see the above error.
>>
>> I have a question - considering the first IP was wrong, the cluster would 
>> have elected the only other node configured as master as the *master*. 
>> This means that there was atleast one master in the cluster. So, for this 
>> exception to happen, could the other master's metadata about shards be 
>> wrong?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/516b52b6-016e-4c42-a642-3cdfb409d5e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Help understanding "UnavailableShardsException" error

Reply via email to