+1. Low level timeouts that we still have in discovery and communication are very hard to explain and I doubt there is anyone who fully understands how they currently work. They bring a lot of complexity and almost zero value. Let's deprecate them and leave only failureDetectionTimeout plus other high level settings that Alexey mentioned.
-Val On Thu, Mar 1, 2018 at 6:06 AM, Ilya Kasnacheev <[email protected]> wrote: > I agree with you. > > I think we could restrict usage of e.g. setConnectTimeout/setSocketTimeout > to people extending SPIs, since different implementations may need > different values. > > However, for user configurations we should only expose timeouts we can > explain, everything else should have reasonable values. > > -- > Ilya Kasnacheev > > 2018-03-01 17:01 GMT+03:00 Alexey Popov <[email protected]>: > > > Hi Igniters, > > > > We often see similar questions from users and customers related to > > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and > > their > > relations. And we see several side-effects after incorrect timeout > > configuration. > > > > I tried to briefly describe these timeout settings (please see below) and > > found out that the most of them do not have sense in terms of cluster > > functions/operations and could not be explained to the users. > > > > I propose to deprecate most of them and leave only the timeouts we can > > explain in common terms ( (setFailureDetectionTimeout, setNetworkTimeout, > > setJoinTimeout and some others). > > > > Please let me know your thoughts. > > > > Thanks, > > Alexey > > > > GLOBAL: > > > > IgniteConfiguration.setNetworkTimeout: > > It is a global timeout for high-level operations where a network is > > involved. For instance, IgniteMessaging delivery uses this timeout or > > DiscoverySpi handshake. > > > > IgniteConfiguration.setFailureDetectionTimeout: > > It is a global timeout for detecting failures at IgniteSpi > implementations > > (including DiscoverySpi and CommunicationSpi). > > The failure detection algorithm actually limits a range of simple network > > operations related to a single logical operation (for instance, a > reliable > > delivery of some DiscoverySpi message within a cluster). > > Failure detection timeout is a cumulative timeout for a socket > connection, > > sending and receiving data bytes and all possible socket retries (if some > > failure happens). > > This timeout is intended to simplify the failure detection condition > from a > > user perspective. > > > > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a special > > case > > for DiscoverySpi client-node Ignite. > > > > TCP DISCOVERY SPI: > > > > If you need more control over failure detection algorithm for > > TcpDiscoverySpi you can explicitly use the following low-level options > > (that > > will disable failureDetectoinTimeout logic): > > > > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout > > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect attempts used > > when establishing connection with the remote node and sending messages to > > it > > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The write > > operation will be repeated getReconnectCount() times if it exceeds this > > timeout > > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment timeout. If a > > message acknowledgment is not received within this timeout, sending is > > considered as failed and SPI will try to repeat send operation. It is > > automatically doubled for simultaneous retries up to getMaxAckTimeout > > value. > > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, if the > > getAckTimeout reaches getMaxAckTimeout then SPI give up sending retries > > > > Another important TcpDiscoverySpi timeouts: > > > > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process when a > > new/restarted node joins a cluster. The node tries to connect to all > > available IP addresses provided by ipFinder within this timeout. > > If the timeout is exceeded, the node will give up and throw an exception > > from Ignition.start(). > > > > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level operations > like > > handshake. It looks like it should be deprecated and the > > IgniteConfiguration.getNetworkTimeout should be used here. > > > > TCP COMMUNICATION SPI: > > > > If you need more control over failure detection algorithm for > > TcpCommunicationSpi you can explicitly use the following low-level > options > > (that will disable failureDetectoinTimeout logic): > > > > 1. TcpCommunicationSpi.setConnectTimeout - socket connection timeout, > will > > be automatically doubled for simultaneous retries (up to > getReconnectCount) > > related to a single logical operation > > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection > timeout, > > the higher limit of getReconnectCount-times doubled getConnectTimeout > > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect attempts > > used > > when establishing connection with the remote node and sending messages to > > it > > > > Another important TcpCommunicationSpi timeouts: > > > > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a message. > > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle connection > timeout > > upon which a connection will be closed. > > > > > > > > > > -- > > Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/ > > >
