If clientFailureDetectionTimeout is not set on server node, will it use failureDetectionTimeout instead?
Either way, this configuration seems to be a bit confusing, but I don't think we can change it now. Let's just make sure it's properly documented. -Val On Mon, Jul 9, 2018 at 5:47 AM Stanislav Lukyanov <stanlukya...@gmail.com> wrote: > Server will use its failureDetectionTimeout when talking to servers and > clientFailureDetectionTimeout when talking to clients. > E.g. a Communication link from server to server uses a > failureDetectionTimeout, and server to client uses a > clientFailureDetectionTimeout. > > Client will use its failureDetectionTimeout all the time, ignoring > clientFailureDetectionTimeout. > > There is even a possibility of asymmetric settings. > Say, server and client use the same config, failureDetectionTimeout=10 and > clientFailureDetectionTimeout=20. > When these two nodes communicate, server will use timeouts of 20 seconds > and client will use timeout of 10 seconds. > > Stan > > From: Valentin Kulichenko > Sent: 6 июля 2018 г. 23:17 > To: dev@ignite.apache.org > Subject: Re: IgniteConfiguration, TcpDiscoverySpi, > TcpCommunicationSpitimeouts > > Stan, > > Can you explain the semantics of both parameters? How do they behave when > set on client or on server? > > -Val > > On Fri, Jul 6, 2018 at 6:12 AM Stanislav Lukyanov <stanlukya...@gmail.com> > wrote: > > > We could just use failureDetectionTimeout all the time I guess. > > The only benefit of clientFailureDetectionTimeout is that it may allow > > clients to be slower/on a slower network than servers. > > > > Do you think it isn’t worth to have a separate setting just for that? > > > > Thanks, > > Stan > > > > From: Valentin Kulichenko > > Sent: 5 июля 2018 г. 18:16 > > To: dev@ignite.apache.org > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi, > > TcpCommunicationSpitimeouts > > > > Stan, > > > > What is the purpose of clientFailureDetectionTimeout? Why can't we just > > always use failureDetectionTimeout? Is there any difference between these > > two timeouts? > > > > -Val > > > > > > > > On Wed, Jul 4, 2018 at 7:00 AM Stanislav Lukyanov < > stanlukya...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I’ve updated the proposed documentation update with a description of > > > metricsUpdateFrequency and a detailed description of > > > failureDetectionTimeout and clientFailureDetectionTimeout relations. > The > > > draft is attached to https://issues.apache.org/jira/browse/IGNITE-7704 > . > > > > > > It seems that relation between failureDetectionTimeout and > > > clientFailureDetectionTimeout is currently too tricky and should also > be > > > changed in future. > > > The problem is that in a server-client connection the server will use > > > clientFailureDetectionTimeout but client will use > > failureDetectionTimeout. > > > In other words, clients ignore clientFailureDetectionTimeout and just > use > > > failureDetectionTimeout. Because of that, one has to provide different > > > values of failureDetectionTimeout in server and client configs which > > seems > > > confusing and inconvenient. > > > So I’d like to add one more point to my earlier proposal: > > > > > > 5. Always use clientFailureDetectionTimeout on clients instead of > > > failureDetectionTimeout > > > *What*: change code to use clientFailureDetectionTimeout on clients > > > *When*: update code and readme.io docs in 2.7 > > > > > > Thanks, > > > Stan > > > > > > From: Valentin Kulichenko > > > Sent: 30 мая 2018 г. 19:09 > > > To: dev@ignite.apache.org > > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi, > > > TcpCommunicationSpitimeouts > > > > > > Stan, > > > > > > Looks like you suggest to only change the default. If so, it's OK. But > > > let's not change the behavior of these timeouts for the case they are > > > explicitly set in config. > > > > > > Thanks, > > > Val > > > > > > On Wed, May 30, 2018 at 1:06 AM, Stanislav Lukyanov < > > > stanlukya...@gmail.com> > > > wrote: > > > > > > > On networkTimeout: no, we don’t have anything like that in > > > > TcpCommunicationSpi. > > > > > > > > On socketWriteTimeout: > > > > First, its semantic is very close to TcpDicsoverySpi.socketTimeout > > (with > > > > the exception that communication uses NIO), and the latter defaults > to > > > > failureDetectionTimeout, > > > > so I think it would help to avoid confusion. > > > > Second, I think we can’t deprecate something without an alternative > > that > > > > would work for most users. > > > > On the other hand, if we do default socketWriteTimeout to > > > > failureDetectionTimeout then we reach a pretty decent API state > > > > where one only needs two properties in IgniteConfiguration neither of > > > > which we’re considering for deprecation and removal in 3.0. > > > > > > > > Stan > > > > > > > > From: Valentin Kulichenko > > > > Sent: 29 мая 2018 г. 22:17 > > > > To: dev@ignite.apache.org > > > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi, > > > > TcpCommunicationSpitimeouts > > > > > > > > Stan, > > > > > > > > OK, I got confused a little :) > > > > > > > > I do agree that TcpDiscoverySpi.networkTimeout should inherit from > > > > IgniteConfiguration.networkTImeout if not set explicitly. Do we have > > the > > > > same setting for TcpCommunicationSpi, BTW? If yes, behavior should be > > > > consistent. > > > > > > > > As for TcpCommunicationSpi.socketWriteTimeout, I'm not sure why you > > want > > > > to > > > > change its behavior. Can we just deprecate it and eventually remove, > > just > > > > as we plan to do for all timeouts from #2? > > > > > > > > -Val > > > > > > > > On Tue, May 29, 2018 at 3:50 AM, Stanislav Lukyanov < > > > > stanlukya...@gmail.com> > > > > wrote: > > > > > > > > > Val, > > > > > > > > > > Which timeouts do you mean? > > > > > > > > > > In #2 I don’t propose to change behavior. > > > > > > > > > > I propose to change behavior for a couple of settings in #3 though. > > > > > I believe the correct approach here would be to target the behavior > > > > change > > > > > for 2.6, > > > > > but keep in mind that we’ll need to carefully analyze the impact > > before > > > > > actually making the changes. > > > > > > > > > > Thanks, > > > > > Stan > > > > > > > > > > From: Valentin Kulichenko > > > > > Sent: 29 мая 2018 г. 0:57 > > > > > To: dev@ignite.apache.org > > > > > Subject: Re: IgniteConfiguration, TcpDiscoverySpi, > > > > > TcpCommunicationSpitimeouts > > > > > > > > > > Hi Stan, > > > > > > > > > > I'm 100% for this activity, however I don't think we should change > > the > > > > > behavior of timeouts you listed in #2 - this can lead to unexpected > > > > > behavior for users who already use them. I would just deprecate > them > > > and > > > > > eventually remove. > > > > > > > > > > -Val > > > > > > > > > > On Mon, May 28, 2018 at 1:29 PM, Stanislav Lukyanov < > > > > > stanlukya...@gmail.com> > > > > > wrote: > > > > > > > > > > > Hi folks, > > > > > > > > > > > > It looks like we stopped half-way with this activity. I’d like to > > > pick > > > > it > > > > > > up. > > > > > > > > > > > > All seem to agree that we should simplify the timeout settings. > > > > > > Here are the specific actions I’d like to propose: > > > > > > > > > > > > 1. Promote the use of global timeouts as the best practice > > > > > > *What*: update the docs to encourage users to rely on the > following > > > > > > timeouts for their “network stability” settings > > > > > > IgniteConfiguration.failureDetectionTimeout > > > > > > IgniteConfiguration.clientFailureDetectionTimeout > > > > > > IgniteConfiguration.networkTimeout > > > > > > *When*: update readme.io docs for 2.5 and Javadoc for 2.6 > > > > > > > > > > > > 2. Discourage the use of finer timeouts > > > > > > *What*: > > > > > > - update the docs to discourage users to use the following > timeouts > > > and > > > > > > announce their upcoming deprecation and removal > > > > > > TcpDiscoverySpi.socketTimeout > > > > > > TcpDiscoverySpi.ackTimeout > > > > > > TcpDiscoverySpi.maxAckTimeout > > > > > > TcpDiscoverySpi.reconnectCount > > > > > > TcpCommunicationSpi.connectTimeout > > > > > > TcpCommunicationSpi.maxConnectTimeout > > > > > > TcpCommunicationSpi.reconnectCount > > > > > > - deprecate the properties in code > > > > > > - remove the properties in code > > > > > > *When*: > > > > > > - readme.io update with deprecation announcement for 2.5 > > > > > > - @Deprecated in code + Javadoc update + respective readme.io > > > > rewording > > > > > > for 2.6 > > > > > > - properties removal in 3.0 > > > > > > > > > > > > 3. Make “orphan” timeouts rely on global timeouts, then deprecate > > and > > > > > > remove > > > > > > *What*: > > > > > > Two settings currently don’t default to the global equivalents, > > > > although > > > > > > they should: > > > > > > - TcpCommunicationSpi.socketWriteTimeout should default to > > > > > > failureDetectionTimeout > > > > > > - TcpDiscoverySpi.networkTimeout should default to > > > IgniteConfiguration. > > > > > > networkTImeout > > > > > > So the course of action would be: > > > > > > - update the docs to explain that these timeouts have to be used > > for > > > > now, > > > > > > but announce their upcoming deprecation and removal > > > > > > - change the properties to default to their global counterparts > and > > > > > > deprecate them in code > > > > > > - remove the properties in code > > > > > > *When*: > > > > > > - readme.io update with deprecation announcement for 2.5 > > > > > > - changing defaults + @Deprecated in code + Javadoc update + > > > respective > > > > > > readme.io rewording for 2.6 > > > > > > - properties removal in 3.0 > > > > > > > > > > > > 4. Don’t touch other timeouts > > > > > > Other timeouts, like TcpDiscoverySpi.joinTimeout or > > > > TcpCommunicationSpi. > > > > > idleConnectionTimeout, > > > > > > are orthogonal to the whole > > > > > > “network stability” theme discussed above, and don’t have to be > > > > changed. > > > > > > > > > > > > Finally, I’ve prepared a draft of the docs page that may be used > > as a > > > > > base > > > > > > for the readme.io update. > > > > > > This email is pretty long already, so please find the draft > > attached > > > to > > > > > > the JIRA issue > > > > > > https://issues.apache.org/jira/browse/IGNITE-7704. > > > > > > > > > > > > Please share your thoughts. > > > > > > > > > > > > Thanks, > > > > > > Stan > > > > > > > > > > > > From: Alexey Popov > > > > > > Sent: 1 марта 2018 г. 17:01 > > > > > > To: dev@ignite.apache.org > > > > > > Subject: IgniteConfiguration, TcpDiscoverySpi, > TcpCommunicationSpi > > > > > timeouts > > > > > > > > > > > > Hi Igniters, > > > > > > > > > > > > We often see similar questions from users and customers related > to > > > > > > IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi > timeouts > > > and > > > > > > their > > > > > > relations. And we see several side-effects after incorrect > timeout > > > > > > configuration. > > > > > > > > > > > > I tried to briefly describe these timeout settings (please see > > below) > > > > and > > > > > > found out that the most of them do not have sense in terms of > > cluster > > > > > > functions/operations and could not be explained to the users. > > > > > > > > > > > > I propose to deprecate most of them and leave only the timeouts > we > > > can > > > > > > explain in common terms ( (setFailureDetectionTimeout, > > > > setNetworkTimeout, > > > > > > setJoinTimeout and some others). > > > > > > > > > > > > Please let me know your thoughts. > > > > > > > > > > > > Thanks, > > > > > > Alexey > > > > > > > > > > > > GLOBAL: > > > > > > > > > > > > IgniteConfiguration.setNetworkTimeout: > > > > > > It is a global timeout for high-level operations where a network > is > > > > > > involved. For instance, IgniteMessaging delivery uses this > timeout > > or > > > > > > DiscoverySpi handshake. > > > > > > > > > > > > IgniteConfiguration.setFailureDetectionTimeout: > > > > > > It is a global timeout for detecting failures at IgniteSpi > > > > > implementations > > > > > > (including DiscoverySpi and CommunicationSpi). > > > > > > The failure detection algorithm actually limits a range of simple > > > > network > > > > > > operations related to a single logical operation (for instance, a > > > > > reliable > > > > > > delivery of some DiscoverySpi message within a cluster). > > > > > > Failure detection timeout is a cumulative timeout for a socket > > > > > connection, > > > > > > sending and receiving data bytes and all possible socket retries > > (if > > > > some > > > > > > failure happens). > > > > > > This timeout is intended to simplify the failure detection > > condition > > > > > from a > > > > > > user perspective. > > > > > > > > > > > > IgniteConfiguration.setClientFailureDetectionTimeout: - it is a > > > > special > > > > > > case > > > > > > for DiscoverySpi client-node Ignite. > > > > > > > > > > > > TCP DISCOVERY SPI: > > > > > > > > > > > > If you need more control over failure detection algorithm for > > > > > > TcpDiscoverySpi you can explicitly use the following low-level > > > options > > > > > > (that > > > > > > will disable failureDetectoinTimeout logic): > > > > > > > > > > > > 1. TcpDiscoverySpi.setConnectTimeout - socket connection timeout > > > > > > 2. TcpDiscoverySpi.setReconnectCount - number of reconnect > attempts > > > > used > > > > > > when establishing connection with the remote node and sending > > > messages > > > > to > > > > > > it > > > > > > 3. TcpDiscoverySpi.setSocketTimeout - socket write timeout. The > > write > > > > > > operation will be repeated getReconnectCount() times if it > exceeds > > > this > > > > > > timeout > > > > > > 4. TcpDiscoverySpi.setAckTimeout - message acknowledgment > timeout. > > > If a > > > > > > message acknowledgment is not received within this timeout, > sending > > > is > > > > > > considered as failed and SPI will try to repeat send operation. > It > > is > > > > > > automatically doubled for simultaneous retries up to > > getMaxAckTimeout > > > > > > value. > > > > > > 5. TcpDiscoverySpi.setMaxAckTimeout - maximum connection timeout, > > if > > > > the > > > > > > getAckTimeout reaches getMaxAckTimeout then SPI give up sending > > > retries > > > > > > > > > > > > Another important TcpDiscoverySpi timeouts: > > > > > > > > > > > > TcpDiscoverySpi.setJoinTimeout - It is a timeout for join process > > > when > > > > a > > > > > > new/restarted node joins a cluster. The node tries to connect to > > all > > > > > > available IP addresses provided by ipFinder within this timeout. > > > > > > If the timeout is exceeded, the node will give up and throw an > > > > exception > > > > > > from Ignition.start(). > > > > > > > > > > > > TcpDiscoverySpi.setNetworkTimeout - timeout for high-level > > operations > > > > > like > > > > > > handshake. It looks like it should be deprecated and the > > > > > > IgniteConfiguration.getNetworkTimeout should be used here. > > > > > > > > > > > > TCP COMMUNICATION SPI: > > > > > > > > > > > > If you need more control over failure detection algorithm for > > > > > > TcpCommunicationSpi you can explicitly use the following > low-level > > > > > options > > > > > > (that will disable failureDetectoinTimeout logic): > > > > > > > > > > > > 1. TcpCommunicationSpi.setConnectTimeout - socket connection > > timeout, > > > > > will > > > > > > be automatically doubled for simultaneous retries (up to > > > > > getReconnectCount) > > > > > > related to a single logical operation > > > > > > 2. TcpCommunicationSpi.setMaxConnectTimeout - maximum connection > > > > > timeout, > > > > > > the higher limit of getReconnectCount-times doubled > > getConnectTimeout > > > > > > 3. TcpCommunicationSpi.setReconnectCount - number of reconnect > > > > attempts > > > > > > used > > > > > > when establishing connection with the remote node and sending > > > messages > > > > to > > > > > > it > > > > > > > > > > > > Another important TcpCommunicationSpi timeouts: > > > > > > > > > > > > TcpDiscoverySpi.setSocketWriteTimeout - timeout to send a > message. > > > > > > TcpDiscoverySpi.setIdleConnectionTimeout - maximum idle > connection > > > > > timeout > > > > > > upon which a connection will be closed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sent from: > http://apache-ignite-developers.2346864.n4.nabble.com/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >