[ 
https://issues.apache.org/jira/browse/STORM-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordi Esteban updated STORM-2551:
---------------------------------
    Description: 
I am trying to deploy a Highly Available Nimbus using Docker. At the moment I 
am only deploying two services (nimbus-1 and nimbus-2), so the configuration 
file for Storm includes the following parameter:  {{nimbus.seeds: [nimbus-1, 
nimbus-2]}}

The issue comes when the first of the services (nimbus-1) is down. For example 
trying to deploy a topology from nimbus-2 could take like 15 minutes. I have 
checked the code and it is because it loops through all {{nimbus.seeds}} hosts 
in order to check which one is the leader. And for each loop it tries to create 
a new NimbusClient (therefore a new ThriftClient) but always passing null as 
the timeout for the created socket. So it tries to connect to the host until a 
ConnectionTimeout is reached. Modifying the parameter 
{{storm.thrift.socket.timeout.ms}} does not change the socket timeout.

I think that the ThriftClient should also use the thrift socket timeout 
parameter ({{storm.thrift.socket.timeout.ms}}) just the same as the 
ThriftServer (or the transport plugin used in the communication) which was 
implemented in the [Story 
2254|https://issues.apache.org/jira/browse/STORM-2254].

(This is my first issue + pull request, so sorry if something is wrong)

  was:
I am trying to deploy a Highly Available Nimbus using Docker. At the moment I 
am only deploying two services (nimbus-1 and nimbus-2), so the configuration 
file for Storm includes the following parameter:  {{nimbus.seeds: [nimbus-1, 
nimbus-2]}}

The issue comes when the first of the services (nimbus-1) is down. For example 
trying to deploy a topology from nimbus-2 could take like 15 minutes. I have 
checked the code and it is because it loops through all {{nimbus.seeds}} hosts 
in order to check which one is the leader. And for each loop it tries to create 
a new NimbusClient (therefore a new ThriftClient) but always passing null as 
the timeout for the created socket. So it tries to connect to the host until a 
ConnectionTimeout is reached. Modifying the parameter 
{{storm.thrift.socket.timeout.ms}} does not change the socket timeout.

I think that the ThriftClient should also use the thrift socket timeout 
parameter ({{storm.thrift.socket.timeout.ms}}) just the same as the 
ThriftServer (or the transport plugin used in the communication) which was 
implemented in the Story [link 
2254|https://issues.apache.org/jira/browse/STORM-2254].

(This is my first issue + pull request, so sorry if something is wrong)


> Thrift client socket timeout
> ----------------------------
>
>                 Key: STORM-2551
>                 URL: https://issues.apache.org/jira/browse/STORM-2551
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Jordi Esteban
>
> I am trying to deploy a Highly Available Nimbus using Docker. At the moment I 
> am only deploying two services (nimbus-1 and nimbus-2), so the configuration 
> file for Storm includes the following parameter:  {{nimbus.seeds: [nimbus-1, 
> nimbus-2]}}
> The issue comes when the first of the services (nimbus-1) is down. For 
> example trying to deploy a topology from nimbus-2 could take like 15 minutes. 
> I have checked the code and it is because it loops through all 
> {{nimbus.seeds}} hosts in order to check which one is the leader. And for 
> each loop it tries to create a new NimbusClient (therefore a new 
> ThriftClient) but always passing null as the timeout for the created socket. 
> So it tries to connect to the host until a ConnectionTimeout is reached. 
> Modifying the parameter {{storm.thrift.socket.timeout.ms}} does not change 
> the socket timeout.
> I think that the ThriftClient should also use the thrift socket timeout 
> parameter ({{storm.thrift.socket.timeout.ms}}) just the same as the 
> ThriftServer (or the transport plugin used in the communication) which was 
> implemented in the [Story 
> 2254|https://issues.apache.org/jira/browse/STORM-2254].
> (This is my first issue + pull request, so sorry if something is wrong)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to