To start the distributed version without Mesos, we run a script which ssh
to every node and executes the command:
./singa  --model=...   --cluster=...

I planned to push the distributed version yesterday but haven't finished
the testing. I will push it ASAP (tonight or tomorrow morning).

regards,
wang wei



On Tue, Jun 23, 2015 at 7:39 PM, Wang Wei <[email protected]> wrote:

> In the current implementation, all processes will register itself to
> zookeeper including hostname and port number.
> It should work well for a cluster of nodes connected using local network.
> Because every node can be identified by its hostname.
> Registering hostname instead of IP is for further optimization on multiple
> network cards. For example, if there are two ethernet cards per node, we
> may do some optimization on selecting which card to use at real-time.
> If we use IP directly, then we fix the ethernet card and cannot do this
> kind of optimization.
>
> For the cluster.conf and model.conf, we assume they are accessible for
> each node (e.g., on NFS, or local disk). Hence, each process will load
> these two configuration files from disk instead of reading it from
> Zookeeper.
>
> Do you want to put the content of hostfile and cluster.conf into
> zookeeper, and let Mesos read these information from zookeeper?
>
> regards,
> wang wei
>
> On Tue, Jun 23, 2015 at 7:24 PM, Anh Dinh <[email protected]> wrote:
>
>> Thanks WangSheng,
>>
>> Should I also assume that the distributed version will read both
>> "hostfile"
>> and "cluster.conf" file from Zookeeper service? It'd make it easier for
>> Mesos to manage Singa.
>>
>> Cheers,
>> Anh.
>>
>>
>> On 23 June 2015 at 11:25, WANG Sheng <[email protected]> wrote:
>>
>> > Hi Anh,
>> >
>> > You are correct. The start/stop zookeeper service is only required in
>> the
>> > standalone version.
>> > In the distributed version, the zookeeper service is always on and
>> managed
>> > by users themselves.
>> >
>> > In this case, before running a singa job, zookeeper need to be
>> initialized
>> > (e.g. clean old data, create missing path).
>> > This initialization phase should only be executed once. In the current
>> > singa architecture, there is no master node,
>> > hence we need an external tool to do the job before launching singa. (We
>> > are planning to implement it next.)
>> >
>> > For your request of writing "hostfile" into zookeeper, it could also be
>> > done by the same tool.
>> > Could you write your code as a new file /support/main.cc which we can
>> later
>> > extend to be the tool.
>> >
>> > For the zookeeper API, it is included in the ZKClusterRT class in
>> > /utils/cluster_rt.h.
>> > I will make the create_zk_node a public function for you to use.
>> >
>> > By the way, for each commit we need to attach a jira ticket.
>> > Please create a jira ticket according to the guide on singa website.
>> (You
>> > need to create an jira account first)
>> >
>> > Best Regards,
>> >
>> > Sheng
>> >
>> >
>> >
>> > On Tue, Jun 23, 2015 at 9:54 AM, Anh Dinh <[email protected]> wrote:
>> >
>> > > Hi guys,
>> > >
>> > > Currently the standalone version (I followed the Quickstart guide)
>> would
>> > > start Zookeeper service  every time I ran "singa-run.sh".
>> > >
>> > > I assume that on the distributed version, the ZK service will be
>> started
>> > > only once by a master node, and the rest of the cluster will know the
>> ZK
>> > > master address?
>> > >
>> > > In this case, since I'm writing Mesos support for singa, could I have
>> the
>> > > following API from the SingaZooKeeperService (or any class that
>> > implements
>> > > ZK service for Singa)?
>> > >
>> > > /**
>> > > * create a node with given name and content.
>> > > * the node can be located at $ZK_PREFIX/filename
>> > > */
>> > > static SingaZooKeeperService::create_zk_node(string filename, string
>> > > content);
>> > >
>> > > I'm using this to write content of the "hostfile" so that all nodes
>> can
>> > > see.
>> > >
>> > > Anh.
>> > >
>> >
>>
>
>

Reply via email to