let's say you create a container node0 in machine A, and node1 in machine B.
In node1, can you ping node0? If you cannot, then Weaver wasn't running properly (with Docker v1.8.3). Anh. On 22 June 2016 at 15:42, Venkat Katta <[email protected]> wrote: > As the docker containers are in different machines i can no longer make > communications between docker containers as they ip's are internal to the > machine. so i am using weaver which is written in documentation > https://singa.incubator.apache.org/docs/docker.html#launch_distributed . > It is trying to bind zsock on localhost not on node1 or node2. > > Regards, > Venkat Satish Katta > ------------------------------ > *From:* Anh Dinh <[email protected]> > *Sent:* Wednesday, June 22, 2016 12:23:26 PM > *To:* Venkat Katta > *Cc:* Wang Wei; [email protected] > > *Subject:* Re: Error while running singa on mesos > > We had problems with Docker version >= 1.9 (yours is even newer), as noted > in https://singa.incubator.apache.org/docs/docker.html#launch_pseudo > > Basically new versions of Docker changed the DNS resolution mechanism: the > Docker daemon no longer updates the /etc/hosts file of existing containers > when new one is launched. > > One suggestion is to downgrade Docker to 1.8: > > sudo apt-get install docker-engine=1.8.3-0~trusty > > Another option is to enter IP addresses manually into /etc/hosts files. > But we have not tried it with Weaver, so there's high chance that it won't > work with Weaver. > > > On 22 June 2016 at 14:39, Venkat Katta <[email protected]> wrote: > >> docker version : 1.11.2 >> >> regards, >> venkat satish katta >> ------------------------------ >> *From:* Anh Dinh <[email protected]> >> *Sent:* Wednesday, June 22, 2016 12:04:56 PM >> *To:* Wang Wei; Venkat Katta >> >> *Cc:* [email protected] >> *Subject:* Re: Error while running singa on mesos >> >> what version of Docker are you running? >> >> Anh. >> >> >> On 22 June 2016 at 14:26, Wang Wei <[email protected]> wrote: >> >>> >>> ---------- Forwarded message ---------- >>> From: Venkat Katta <[email protected]> >>> Date: Wed, Jun 22, 2016 at 1:31 PM >>> Subject: Re: Error while running singa on mesos >>> To: Wang Wei <[email protected]> >>> >>> >>> It works fine if I replace the node0 and node2 with their IP address. I >>> am using weave for transparent communication between the containers. In >>> singa.conf to connect to zookeeper i used node0 but not the ipaddress of >>> node0 it is able to connect why can't singa resolve the hostname. And while >>> running singa with mesos it is using localhost rather ip address node1 and >>> node2, also we are not giving any arguement while running the singa >>> regarding ip address of the slaves. >>> >>> >>> F0622 05:18:28.932391 1513 socket.cc:98] Check failed: port != -1 (-1 >>> vs. -1) tcp://localhost:* >>> >>> >>> Thanks, >>> >>> Venkat satish katta >>> ------------------------------ >>> *From:* Wang Wei <[email protected]> >>> *Sent:* Wednesday, June 22, 2016 8:46:36 AM >>> *To:* Venkat Katta >>> >>> *Subject:* Re: Error while running singa on mesos >>> >>> If you are using Docker (withou mesos), it could be the problem of >>> network routing. May need to configure the Docker to setup the network then >>> node0 and node2 can be accessed from node1. >>> We are trying your configuration. >>> >>> regards, >>> wang wei >>> >>> >>> On Wed, Jun 22, 2016 at 10:32 AM, Wang Wei <[email protected]> wrote: >>> >>>> Hi Venkat, >>>> >>>> It should be the problem of the node address. >>>> Pls replace node0 and node2 with their IP addresses. >>>> >>>> regards, >>>> wei >>>> >>>> On Wed, Jun 22, 2016 at 2:40 AM, Venkat Katta <[email protected]> wrote: >>>> >>>>> i tried running without mesos i got the same error >>>>> >>>>> >>>>> root@node0:~/incubator-singa# ./bin/singa-run.sh -conf >>>>> examples/cifar10/hybrid.conf >>>>> Unique JOB_ID is 4 >>>>> Record job information to /tmp/singa-log/job-info/job-4-20160621-183305 >>>>> Executing @ node2 : cd /root/incubator-singa; source >>>>> /root/incubator-singa/conf/profile; ./singa -singa_conf >>>>> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf >>>>> /root/incubator-singa/examples/cifar10/hybrid.conf >>>>> Executing @ node0 : cd /root/incubator-singa; source >>>>> /root/incubator-singa/conf/profile; ./singa -singa_conf >>>>> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf >>>>> /root/incubator-singa/examples/cifar10/hybrid.conf >>>>> F0621 18:33:24.171468 725 socket.cc:98] Check failed: port != -1 (-1 >>>>> vs. -1) tcp://node2:* >>>>> *** Check failure stack trace: *** >>>>> @ 0x7f10d0a6b9fd google::LogMessage::Fail() >>>>> @ 0x7f10d0a6d89d google::LogMessage::SendToLog() >>>>> @ 0x7f10d0a6b5ec google::LogMessage::Flush() >>>>> @ 0x7f10d0a6e1be google::LogMessageFatal::~LogMessageFatal() >>>>> @ 0x7f10d0e05d79 singa::Router::Bind() >>>>> @ 0x7f10d0d7a8bc singa::Driver::Train() >>>>> @ 0x7f10d0d7f48b singa::Driver::Train() >>>>> @ 0x40c915 main >>>>> @ 0x7f10c5f13f45 (unknown) >>>>> @ 0x40cb7e (unknown) >>>>> F0621 18:33:06.244278 1042 socket.cc:98] Check failed: port != -1 (-1 >>>>> vs. -1) tcp://node0:* >>>>> *** Check failure stack trace: *** >>>>> @ 0x7f6d4516d9fd google::LogMessage::Fail() >>>>> @ 0x7f6d4516f89d google::LogMessage::SendToLog() >>>>> @ 0x7f6d4516d5ec google::LogMessage::Flush() >>>>> @ 0x7f6d451701be google::LogMessageFatal::~LogMessageFatal() >>>>> @ 0x7f6d45507d79 singa::Router::Bind() >>>>> @ 0x7f6d4547c8bc singa::Driver::Train() >>>>> @ 0x7f6d4548148b singa::Driver::Train() >>>>> @ 0x40c915 main >>>>> @ 0x7f6d3a615f45 (unknown) >>>>> @ 0x40cb7e (unknown) >>>>> bash: line 1: 725 Aborted (core dumped) ./singa >>>>> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf >>>>> /root/incubator-singa/examples/cifar10/hybrid.conf -host node2 >>>>> bash: line 1: 1042 Aborted (core dumped) ./singa >>>>> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf >>>>> /root/incubator-singa/examples/cifar10/hybrid.conf -host node0 >>>>> E0621 18:33:07.467438 1067 job_manager.cc:156] job 4 not exists >>>>> >>>>> >>>>> ------------------------------ >>>>> *From:* Wang Wei <[email protected]> >>>>> *Sent:* Tuesday, June 21, 2016 7:09:46 PM >>>>> *To:* Venkat Katta >>>>> *Cc:* [email protected] >>>>> *Subject:* Re: Error while running singa on mesos >>>>> >>>>> Hi, >>>>> >>>>> Can you try to run it without Mesos? >>>>> 1. Compile singa with enable-dist >>>>> 2. change conf/singa.conf to set the zookeeper host >>>>> 3. update the conf/hostfile one line per machine >>>>> 4. update the conf/profile to export LD_LIBRARY_PATH >>>>> >>>>> regards, >>>>> Wei >>>>> >>>>> On Tue, Jun 21, 2016 at 8:52 PM, Venkat Katta <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> I am actually trying to run singa on mesos in fully distributed >>>>>> architecture. I built the docker images as given in the documentation. I >>>>>> am >>>>>> using mesos 0.28.2 and singa 0.3-rc3.I am running each docker container >>>>>> using --net=host flag so that they take the ip of the system. Singa works >>>>>> as long as the workers are all in one machine . >>>>>> When I try to use two machines for training it shows error >>>>>> >>>>>> >>>>>> F0617 10:00:43.862246 2742 socket.cc:98] Check failed: port != -1 (-1 >>>>>> vs. -1) tcp://localhost:* >>>>>> >>>>>> >>>>>> so while running the scheduler do we need to give it hostfile >>>>>> containing all the hosts. How does it know the remaining hosts in >>>>>> cluster. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> Venkat Satish Katta. >>>>>> >>>>> >>>>> >>>> >>> >>> >> >
