[
https://issues.apache.org/jira/browse/STORM-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jordi Esteban updated STORM-2551:
---------------------------------
Description:
I am trying to deploy a Highly Available Nimbus using Docker. At the moment I
am only deploying two services (nimbus-1 and nimbus-2), so the configuration
file for Storm includes the following parameter: {{nimbus.seeds: [nimbus-1,
nimbus-2]}}
The issue comes when the first of the services (nimbus-1) is down. For example
trying to deploy a topology from nimbus-2 could take like 15 minutes. I have
checked the code and it is because it loops through all {{nimbus.seeds}} hosts
in order to check which one is the leader. And for each loop it tries to create
a new NimbusClient (therefore a new ThriftClient) but always passing null as
the timeout for the created socket. So it tries to connect to the host until a
ConnectionTimeout is reached. Modifying the parameter
{{storm.thrift.socket.timeout.ms}} does not change the socket timeout.
I think that the ThriftClient should also use the thrift socket timeout
parameter ({{storm.thrift.socket.timeout.ms}}) just the same as the
ThriftServer (or the transport plugin used in the communication) which was
implemented in the [Story
2254|https://issues.apache.org/jira/browse/STORM-2254].
(This is my first issue + pull request, so sorry if something is wrong)
was:
I am trying to deploy a Highly Available Nimbus using Docker. At the moment I
am only deploying two services (nimbus-1 and nimbus-2), so the configuration
file for Storm includes the following parameter: {{nimbus.seeds: [nimbus-1,
nimbus-2]}}
The issue comes when the first of the services (nimbus-1) is down. For example
trying to deploy a topology from nimbus-2 could take like 15 minutes. I have
checked the code and it is because it loops through all {{nimbus.seeds}} hosts
in order to check which one is the leader. And for each loop it tries to create
a new NimbusClient (therefore a new ThriftClient) but always passing null as
the timeout for the created socket. So it tries to connect to the host until a
ConnectionTimeout is reached. Modifying the parameter
{{storm.thrift.socket.timeout.ms}} does not change the socket timeout.
I think that the ThriftClient should also use the thrift socket timeout
parameter ({{storm.thrift.socket.timeout.ms}}) just the same as the
ThriftServer (or the transport plugin used in the communication) which was
implemented in the Story [link
2254|https://issues.apache.org/jira/browse/STORM-2254].
(This is my first issue + pull request, so sorry if something is wrong)
> Thrift client socket timeout
> ----------------------------
>
> Key: STORM-2551
> URL: https://issues.apache.org/jira/browse/STORM-2551
> Project: Apache Storm
> Issue Type: Bug
> Reporter: Jordi Esteban
>
> I am trying to deploy a Highly Available Nimbus using Docker. At the moment I
> am only deploying two services (nimbus-1 and nimbus-2), so the configuration
> file for Storm includes the following parameter: {{nimbus.seeds: [nimbus-1,
> nimbus-2]}}
> The issue comes when the first of the services (nimbus-1) is down. For
> example trying to deploy a topology from nimbus-2 could take like 15 minutes.
> I have checked the code and it is because it loops through all
> {{nimbus.seeds}} hosts in order to check which one is the leader. And for
> each loop it tries to create a new NimbusClient (therefore a new
> ThriftClient) but always passing null as the timeout for the created socket.
> So it tries to connect to the host until a ConnectionTimeout is reached.
> Modifying the parameter {{storm.thrift.socket.timeout.ms}} does not change
> the socket timeout.
> I think that the ThriftClient should also use the thrift socket timeout
> parameter ({{storm.thrift.socket.timeout.ms}}) just the same as the
> ThriftServer (or the transport plugin used in the communication) which was
> implemented in the [Story
> 2254|https://issues.apache.org/jira/browse/STORM-2254].
> (This is my first issue + pull request, so sorry if something is wrong)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)