[
https://issues.apache.org/jira/browse/IGNITE-19715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikhail Petrov updated IGNITE-19715:
------------------------------------
Description:
Thin client operations can take a long time if PA is enabled and some cluster
nodes are not reachable over network.
Consider the following scenario:
1. The thin client have already sucessfully established connection to all
configured node addresses.
2. A particular cluster node becomes unreachable over network. It can be
reproduced with iptables -A INPUT -p tcp --dport for Linux.
3. The thin client periodically sends put request which is mapped by PA to the
unreachable node.
4. Firstly all attempts to perform put will lead to `ClientException: Timeout
was reached before computation completed.` exception. But eventually the
connection to the unreachable node will be closed by OS (see tcp_keepalive_time
for Linux).
This will lead to reestablishing connection to the unreachable node during
handling of the next put (see ReliableChannel.java:1012)
We currently do not set a timeout for the open connection operation (see
GridNioClientConnectionMultiplexer#open, here we use Integer.MAX_VALUE for
Socket#connect(java.net.SocketAddress, int))
As a result socket#connect operation (and hence put operation) hangs for a
significant amount of time (it depends on OS parameters, usually it is couple
of minutes). This is confusing for users because a single put may take much
longer than the configured ClientConfiguration#setTimeout property.
was:
Thin client operations can take a long time if PA is enabled and some cluster
nodes are not reachable over network.
Consider the following scenario:
1. The thin client have already sucessfully established connection to all
configured node addresses.
2. A particular cluster node becomes unreachable over network. It can be
reproduced with iptables -A INPUT -p tcp --dport for Linux.
3. The thin client periodically sends put request which is mapped by PA to the
unreachable node.
4. Firstly all attempts to perform put will lead to `ClientException: Timeout
was reached before computation completed.` exception. But eventually the
connection to the unreachable node will be closed by OS (see tcp_keepalive_time
for Linux).
This will lead to reestablishing connection to the unreachable node during
handling of the next put (see ReliableChannel.java:1012)
We currently do not set a timeout for the open connection operation (see
GridNioClientConnectionMultiplexer#open, here we use Integer.MAX_VALUE for
Socket#connect(java.net.SocketAddress, int))
As a result put operation hangs for a significant amount of time (it depends on
OS parameters, usually it is couple of minutes) This is confusing for users
because a single PUT takes much longer than the configured
ClientConfiguration#setTimeout property.
> Thin client operations can take a long time if PA is enabled and some cluster
> nodes are not network reachable.
> --------------------------------------------------------------------------------------------------------------
>
> Key: IGNITE-19715
> URL: https://issues.apache.org/jira/browse/IGNITE-19715
> Project: Ignite
> Issue Type: Bug
> Reporter: Mikhail Petrov
> Assignee: Mikhail Petrov
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Thin client operations can take a long time if PA is enabled and some cluster
> nodes are not reachable over network.
> Consider the following scenario:
> 1. The thin client have already sucessfully established connection to all
> configured node addresses.
> 2. A particular cluster node becomes unreachable over network. It can be
> reproduced with iptables -A INPUT -p tcp --dport for Linux.
> 3. The thin client periodically sends put request which is mapped by PA to
> the unreachable node.
> 4. Firstly all attempts to perform put will lead to `ClientException:
> Timeout was reached before computation completed.` exception. But eventually
> the connection to the unreachable node will be closed by OS (see
> tcp_keepalive_time for Linux).
> This will lead to reestablishing connection to the unreachable node during
> handling of the next put (see ReliableChannel.java:1012)
> We currently do not set a timeout for the open connection operation (see
> GridNioClientConnectionMultiplexer#open, here we use Integer.MAX_VALUE for
> Socket#connect(java.net.SocketAddress, int))
> As a result socket#connect operation (and hence put operation) hangs for a
> significant amount of time (it depends on OS parameters, usually it is couple
> of minutes). This is confusing for users because a single put may take much
> longer than the configured ClientConfiguration#setTimeout property.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)