To start the distributed version without Mesos, we run a script which ssh to every node and executes the command: ./singa --model=... --cluster=...
I planned to push the distributed version yesterday but haven't finished the testing. I will push it ASAP (tonight or tomorrow morning). regards, wang wei On Tue, Jun 23, 2015 at 7:39 PM, Wang Wei <[email protected]> wrote: > In the current implementation, all processes will register itself to > zookeeper including hostname and port number. > It should work well for a cluster of nodes connected using local network. > Because every node can be identified by its hostname. > Registering hostname instead of IP is for further optimization on multiple > network cards. For example, if there are two ethernet cards per node, we > may do some optimization on selecting which card to use at real-time. > If we use IP directly, then we fix the ethernet card and cannot do this > kind of optimization. > > For the cluster.conf and model.conf, we assume they are accessible for > each node (e.g., on NFS, or local disk). Hence, each process will load > these two configuration files from disk instead of reading it from > Zookeeper. > > Do you want to put the content of hostfile and cluster.conf into > zookeeper, and let Mesos read these information from zookeeper? > > regards, > wang wei > > On Tue, Jun 23, 2015 at 7:24 PM, Anh Dinh <[email protected]> wrote: > >> Thanks WangSheng, >> >> Should I also assume that the distributed version will read both >> "hostfile" >> and "cluster.conf" file from Zookeeper service? It'd make it easier for >> Mesos to manage Singa. >> >> Cheers, >> Anh. >> >> >> On 23 June 2015 at 11:25, WANG Sheng <[email protected]> wrote: >> >> > Hi Anh, >> > >> > You are correct. The start/stop zookeeper service is only required in >> the >> > standalone version. >> > In the distributed version, the zookeeper service is always on and >> managed >> > by users themselves. >> > >> > In this case, before running a singa job, zookeeper need to be >> initialized >> > (e.g. clean old data, create missing path). >> > This initialization phase should only be executed once. In the current >> > singa architecture, there is no master node, >> > hence we need an external tool to do the job before launching singa. (We >> > are planning to implement it next.) >> > >> > For your request of writing "hostfile" into zookeeper, it could also be >> > done by the same tool. >> > Could you write your code as a new file /support/main.cc which we can >> later >> > extend to be the tool. >> > >> > For the zookeeper API, it is included in the ZKClusterRT class in >> > /utils/cluster_rt.h. >> > I will make the create_zk_node a public function for you to use. >> > >> > By the way, for each commit we need to attach a jira ticket. >> > Please create a jira ticket according to the guide on singa website. >> (You >> > need to create an jira account first) >> > >> > Best Regards, >> > >> > Sheng >> > >> > >> > >> > On Tue, Jun 23, 2015 at 9:54 AM, Anh Dinh <[email protected]> wrote: >> > >> > > Hi guys, >> > > >> > > Currently the standalone version (I followed the Quickstart guide) >> would >> > > start Zookeeper service every time I ran "singa-run.sh". >> > > >> > > I assume that on the distributed version, the ZK service will be >> started >> > > only once by a master node, and the rest of the cluster will know the >> ZK >> > > master address? >> > > >> > > In this case, since I'm writing Mesos support for singa, could I have >> the >> > > following API from the SingaZooKeeperService (or any class that >> > implements >> > > ZK service for Singa)? >> > > >> > > /** >> > > * create a node with given name and content. >> > > * the node can be located at $ZK_PREFIX/filename >> > > */ >> > > static SingaZooKeeperService::create_zk_node(string filename, string >> > > content); >> > > >> > > I'm using this to write content of the "hostfile" so that all nodes >> can >> > > see. >> > > >> > > Anh. >> > > >> > >> > >
