Github user zcorrea commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/958#discussion_r102332356
--- Diff: core/sqf/monitor/linux/zclient.cxx ---
@@ -472,6 +469,65 @@ void CZClient::CheckCluster( void )
TRACE_EXIT;
}
+void CZClient::CheckMyZNode( void )
+{
+ const char method_name[] = "CZClient::CheckMyZNode";
+ TRACE_ENTRY;
+
+ int zerr;
+ struct timespec currentTime;
+
+ if ( IsCheckCluster() )
+ {
+ if (resetMyZNodeFailedTime_)
+ {
+ resetMyZNodeFailedTime_ = false;
+ clock_gettime(CLOCK_REALTIME, &myZNodeFailedTime_);
+ myZNodeFailedTime_.tv_sec += (GetSessionTimeout() * 2);
+ if (trace_settings & (TRACE_INIT | TRACE_RECOVERY))
+ {
+ trace_printf( "%s@%d" " - Resetting MyZnode Fail Time
%ld(secs)\n"
+ , method_name, __LINE__
+ , myZNodeFailedTime_.tv_sec );
+ }
+ }
+ if ( ! IsZNodeExpired( Node_name, zerr ) )
+ {
+ if ( zerr == ZCONNECTIONLOSS || zerr == ZOPERATIONTIMEOUT )
+ {
+ // Ignore transient errors with the quorum.
+ // However, if longer than the session
+ // timeout, handle it as a hard error.
+ clock_gettime(CLOCK_REALTIME, ¤tTime);
+ if (currentTime.tv_sec > myZNodeFailedTime_.tv_sec)
--- End diff --
The desired behavior is to continually reset the local nodes ZNode
expiration time when the ZNode has not expired, which is the normal state. The
only errors that can return from IsZNodeExpired and that are handled at this
point are communication errors with the Zookeeper quorum, i.e., the connection
loss and operation timeout. These errors can be transient, but if they persist
beyond the myZNodeFailedTime they indicate that the local ZNode has gone beyond
the local monitor's session expiration time window and the local monitor must
bring itself down.
So yes, setting resetMyZNodeFailedTime_ to true is to reset
myZNodeFailedTime on each iteration, every zcMonitoringRateValue in seconds, to
push the expiration time out until there is a communication failure with the
Zookeeper quorum.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---