[
https://issues.apache.org/jira/browse/TRAFODION-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15519318#comment-15519318
]
ASF GitHub Bot commented on TRAFODION-2245:
-------------------------------------------
GitHub user selvaganesang opened a pull request:
https://github.com/apache/incubator-trafodion/pull/726
[TRAFODION-2245] Multiple sqcheck and jps processes running when moni…
…tor is downed and up
as dcsserver checks if trafodion is up
A new option sqcheck -n <nid> is introduced to determine if the node is up
and running
DCS server is changed to use this option when it detects the server is gone
rather than at the time of starting the processes
Also fixed the failure of hive/TEST018 due to expected file difference
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/selvaganesang/incubator-trafodion
trafodion-2245
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-trafodion/pull/726.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #726
----
commit 7bc317d0c44cd5e0a75ce2661687a2fd3d562cc2
Author: selvaganesang <[email protected]>
Date: 2016-09-24T17:03:18Z
[TRAFODION-2245] Multiple sqcheck and jps processes running when monitor is
downed and up
as dcsserver checks if trafodion is up
A new option sqcheck -n <nid> is introduced to determine if the node is up
and running
DCS server is changed to use this option when it detects the server is gone
rather than at the time of starting the processes
Also fixed the failure of hive/TEST018 due to expected file difference
----
> Multiple sqcheck and jps processes running when monitor is downed and up as
> dcsserver checks if trafodion is up
> ---------------------------------------------------------------------------------------------------------------
>
> Key: TRAFODION-2245
> URL: https://issues.apache.org/jira/browse/TRAFODION-2245
> Project: Apache Trafodion
> Issue Type: Bug
> Components: dcs
> Affects Versions: 2.1-incubating
> Environment: Testing trafodion when failures occurred. HDP 2.4
> distro contents and a standard installation on CentOS 6
> Reporter: Carol Pearson
> Assignee: Selvaganesan Govindarajan
>
> Dcsserver checks if Trafodion is running by using sqcheck. That can hang in
> some circumstances
> In this case we had a DTM failure and recovery took a while. The node went to
> a SoftDown state as the DTM recovered. Meanwhile, dcsserver was looking for
> trafodion to come up so that it could start the mxosrvrs on that node. That
> resulted in many hung sqchecks - the notable symptom is that they all had the
> same ppid.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)