It'd make it easier for Mesos if it does not have to copy the config files to all the nodes. Whether it is possible because of NFS or ZK is not really important for Mesos.
For now I'd assume NFS. But it's reasonable to expect that users don't always have access to NFS. For instance an user simply launches VMs in EC2 for testing. On 23 June 2015 at 19:43, Wang Wei <[email protected]> wrote: > To start the distributed version without Mesos, we run a script which ssh > to every node and executes the command: > ./singa --model=... --cluster=... > > I planned to push the distributed version yesterday but haven't finished > the testing. I will push it ASAP (tonight or tomorrow morning). > > regards, > wang wei > > > > On Tue, Jun 23, 2015 at 7:39 PM, Wang Wei <[email protected]> wrote: > > > In the current implementation, all processes will register itself to > > zookeeper including hostname and port number. > > It should work well for a cluster of nodes connected using local network. > > Because every node can be identified by its hostname. > > Registering hostname instead of IP is for further optimization on > multiple > > network cards. For example, if there are two ethernet cards per node, we > > may do some optimization on selecting which card to use at real-time. > > If we use IP directly, then we fix the ethernet card and cannot do this > > kind of optimization. > > > > For the cluster.conf and model.conf, we assume they are accessible for > > each node (e.g., on NFS, or local disk). Hence, each process will load > > these two configuration files from disk instead of reading it from > > Zookeeper. > > > > Do you want to put the content of hostfile and cluster.conf into > > zookeeper, and let Mesos read these information from zookeeper? > > > > regards, > > wang wei > > > > On Tue, Jun 23, 2015 at 7:24 PM, Anh Dinh <[email protected]> wrote: > > > >> Thanks WangSheng, > >> > >> Should I also assume that the distributed version will read both > >> "hostfile" > >> and "cluster.conf" file from Zookeeper service? It'd make it easier for > >> Mesos to manage Singa. > >> > >> Cheers, > >> Anh. > >> > >> > >> On 23 June 2015 at 11:25, WANG Sheng <[email protected]> wrote: > >> > >> > Hi Anh, > >> > > >> > You are correct. The start/stop zookeeper service is only required in > >> the > >> > standalone version. > >> > In the distributed version, the zookeeper service is always on and > >> managed > >> > by users themselves. > >> > > >> > In this case, before running a singa job, zookeeper need to be > >> initialized > >> > (e.g. clean old data, create missing path). > >> > This initialization phase should only be executed once. In the current > >> > singa architecture, there is no master node, > >> > hence we need an external tool to do the job before launching singa. > (We > >> > are planning to implement it next.) > >> > > >> > For your request of writing "hostfile" into zookeeper, it could also > be > >> > done by the same tool. > >> > Could you write your code as a new file /support/main.cc which we can > >> later > >> > extend to be the tool. > >> > > >> > For the zookeeper API, it is included in the ZKClusterRT class in > >> > /utils/cluster_rt.h. > >> > I will make the create_zk_node a public function for you to use. > >> > > >> > By the way, for each commit we need to attach a jira ticket. > >> > Please create a jira ticket according to the guide on singa website. > >> (You > >> > need to create an jira account first) > >> > > >> > Best Regards, > >> > > >> > Sheng > >> > > >> > > >> > > >> > On Tue, Jun 23, 2015 at 9:54 AM, Anh Dinh <[email protected]> wrote: > >> > > >> > > Hi guys, > >> > > > >> > > Currently the standalone version (I followed the Quickstart guide) > >> would > >> > > start Zookeeper service every time I ran "singa-run.sh". > >> > > > >> > > I assume that on the distributed version, the ZK service will be > >> started > >> > > only once by a master node, and the rest of the cluster will know > the > >> ZK > >> > > master address? > >> > > > >> > > In this case, since I'm writing Mesos support for singa, could I > have > >> the > >> > > following API from the SingaZooKeeperService (or any class that > >> > implements > >> > > ZK service for Singa)? > >> > > > >> > > /** > >> > > * create a node with given name and content. > >> > > * the node can be located at $ZK_PREFIX/filename > >> > > */ > >> > > static SingaZooKeeperService::create_zk_node(string filename, string > >> > > content); > >> > > > >> > > I'm using this to write content of the "hostfile" so that all nodes > >> can > >> > > see. > >> > > > >> > > Anh. > >> > > > >> > > >> > > > > >
