[ https://issues.apache.org/jira/browse/TRAFODION-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381140#comment-16381140 ]
ASF GitHub Bot commented on TRAFODION-2883: ------------------------------------------- Github user zcorrea commented on a diff in the pull request: https://github.com/apache/trafodion/pull/1457#discussion_r171408759 --- Diff: core/sqf/monitor/linux/zclient.cxx --- @@ -488,6 +488,103 @@ int CZClient::ZooExistRetry(zhandle_t *zh, const char *path, int watch, struct S return rc; } +const char* CZClient::WaitForAndReturnMaster( bool doWait ) +{ + const char method_name[] = "CZClient::WaitForAndReturnMaster"; + TRACE_ENTRY; + + bool found = false; + int rc = -1; + int retries = 0; + Stat stat; + + struct String_vector nodes = {0, NULL}; + stringstream ss; + ss.str( "" ); + ss << zkRootNode_.c_str() + << zkRootNodeInstance_.c_str() + << ZCLIENT_MASTER_ZNODE; + string masterMonitor( ss.str( ) ); + + // wait for 3 minutes for giving up. + while ( (!found) && (retries < 180)) + { + if (trace_settings & (TRACE_INIT | TRACE_RECOVERY)) + { + trace_printf( "%s@%d trafCluster=%s\n" + , method_name, __LINE__, masterMonitor.c_str() ); + } + // Verify the existence of the parent ZCLIENT_MASTER_ZNODE + rc = ZooExistRetry( ZHandle, masterMonitor.c_str( ), 0, &stat ); + + if ( rc == ZNONODE ) + { + if (doWait == false) + { + break; + } + continue; --- End diff -- Yes, good catch! > Preliminary Trafodion Foundation Scalability Enhancements > --------------------------------------------------------- > > Key: TRAFODION-2883 > URL: https://issues.apache.org/jira/browse/TRAFODION-2883 > Project: Apache Trafodion > Issue Type: Improvement > Components: dtm, foundation, installer > Affects Versions: 2.3 > Reporter: Gonzalo E Correa > Assignee: Gonzalo E Correa > Priority: Major > Fix For: 2.3 > > > Initial changes required to: > - AGENT mode monitor > o Preliminary change to remove dependency on OpenMPI during > initialization of operational cluster by creating a cluster > of one node (MASTER monitor) where other remote nodes (SLAVE > monitors) join the cluster through the MASTER > - MASTER monitor selection > - Scale bug fixes found when creating clusters greater than 120 nodes -- This message was sent by Atlassian JIRA (v7.6.3#76005)