[
https://issues.apache.org/jira/browse/TRAFODION-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637838#comment-15637838
]
Arvind Narain commented on TRAFODION-2328:
------------------------------------------
Thanks for working on this defect. Could you share how these new options ( la,
lq, key ) will be used ?
For the problem seen when one of the zookeeper nodes/process among the quorum
is down, I think we need to pass in the quorum information to the "-server"
option. Currently we pick one server and port and pass that. With the change if
we invoke "dcs zkcli" it should default to showing all servers of the quorum
like hbase zkcli does:
> hbase zkcli
Connecting to
wmstesting-1.novalocal:2181,wmstesting-3.novalocal:2181,wmstesting-2.novalocal:2181...
>bin/dcs zkcli
Connecting to wmstesting-1.novalocal:2181
If you do make this change to -server, then note that dcscheck and dcsstart
will need to be modified to pick the correct row/column.
Other suggestion - we could eliminate the configuration in dcs-site.xml by
utilizing the following env variables that get set in a cluster environment:
[trafodion@wmstesting-1 dcs-2.2.0]$ env|grep ZOO
ZOOKEEPER=zookeeper
ZOO_PORT_NODES=wmstesting-2.novalocal:2181,wmstesting-1.novalocal:2181,wmstesting-3.novalocal:2181
ZOOKEEPER_PORT=2181
ZOOKEEPER_NODES=wmstesting-3.novalocal,wmstesting-1.novalocal,wmstesting-2.novalocal
> There is no active DcsMaster in the output of the dcscheck when stop the lead
> zookeeper that is the active DcsMaster
> --------------------------------------------------------------------------------------------------------------------
>
> Key: TRAFODION-2328
> URL: https://issues.apache.org/jira/browse/TRAFODION-2328
> Project: Apache Trafodion
> Issue Type: Bug
> Reporter: taian.wei
> Assignee: taian.wei
>
> Suppose the active DcsMaster and the lead zookeeper are both node A, now stop
> the lead zookeepr node A, do dcscheck on node B and check the active
> DcsMaster from its output, we find some errors, no active DcsMaster displayed.
> erorrs:
> Exception in thread "main"
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
> = ConnectionLoss for /trafodion/dcs/master
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1468)
> at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1496)
> at
> org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:725)
> at
> org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593)
> at
> org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365)
> at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323)
> at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)