[
https://issues.apache.org/jira/browse/TRAFODION-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15527028#comment-15527028
]
ASF GitHub Bot commented on TRAFODION-2235:
-------------------------------------------
Github user zcorrea commented on a diff in the pull request:
https://github.com/apache/incubator-trafodion/pull/724#discussion_r80762682
--- Diff: core/sqf/monitor/linux/monitor.cxx ---
@@ -1294,7 +1441,31 @@ int main (int argc, char *argv[])
}
}
+ if ( IsRealCluster )
--- End diff --
It is true by default.
If the environment variable SQ_VIRTUAL_NODES exists, it is set to false.
> Enhance node failure detection and coordination
> -----------------------------------------------
>
> Key: TRAFODION-2235
> URL: https://issues.apache.org/jira/browse/TRAFODION-2235
> Project: Apache Trafodion
> Issue Type: Bug
> Components: foundation, installer
> Affects Versions: 2.1-incubating
> Reporter: Gonzalo E Correa
> Assignee: Gonzalo E Correa
>
> Certain server and network failures are not detected by the monitor processes
> which cause a safety net failure detection mechanism to trigger in all
> Trafodion nodes. This safety net mechanism is controlled by the environment
> variable SQ_MON_SYNC_TIMEOUT currently set at 15 minutes.
> This JIRA is to enhance the node failure mechanism in the Trafodion
> foundation components, specifically the monitor process, to detect a
> non-responsive node and handle it as a node down condition when a
> configurable timeout event is detected prior to the safety net failure
> mechanism described above.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)