Please check the new pull request: https://github.com/apache/incubator-singa/pull/182
See you if you can launch with Mesos. I tested with 3 machines and it works. Anh. On 22 June 2016 at 15:54, Venkat Katta <[email protected]> wrote: > Yes i can ping node0 from node1. I am also able to connect to zookeeper > and mesos master on node0 from node1. > > > Thanks, > > > Venkat Satish Katta > ------------------------------ > *From:* Anh Dinh <[email protected]> > *Sent:* Wednesday, June 22, 2016 1:20:43 PM > > *To:* Venkat Katta > *Cc:* Wang Wei; [email protected] > *Subject:* Re: Error while running singa on mesos > > let's say you create a container node0 in machine A, and node1 in machine > B. > > In node1, can you ping node0? > > If you cannot, then Weaver wasn't running properly (with Docker v1.8.3). > > Anh. > > > On 22 June 2016 at 15:42, Venkat Katta <[email protected]> wrote: > >> As the docker containers are in different machines i can no longer make >> communications between docker containers as they ip's are internal to the >> machine. so i am using weaver which is written in documentation >> https://singa.incubator.apache.org/docs/docker.html#launch_distributed . >> It is trying to bind zsock on localhost not on node1 or node2. >> >> Regards, >> Venkat Satish Katta >> ------------------------------ >> *From:* Anh Dinh <[email protected]> >> *Sent:* Wednesday, June 22, 2016 12:23:26 PM >> *To:* Venkat Katta >> *Cc:* Wang Wei; [email protected] >> >> *Subject:* Re: Error while running singa on mesos >> >> We had problems with Docker version >= 1.9 (yours is even newer), as >> noted in >> https://singa.incubator.apache.org/docs/docker.html#launch_pseudo >> >> Basically new versions of Docker changed the DNS resolution mechanism: >> the Docker daemon no longer updates the /etc/hosts file of existing >> containers when new one is launched. >> >> One suggestion is to downgrade Docker to 1.8: >> >> sudo apt-get install docker-engine=1.8.3-0~trusty >> >> Another option is to enter IP addresses manually into /etc/hosts files. >> But we have not tried it with Weaver, so there's high chance that it won't >> work with Weaver. >> >> >> On 22 June 2016 at 14:39, Venkat Katta <[email protected]> wrote: >> >>> docker version : 1.11.2 >>> >>> regards, >>> venkat satish katta >>> ------------------------------ >>> *From:* Anh Dinh <[email protected]> >>> *Sent:* Wednesday, June 22, 2016 12:04:56 PM >>> *To:* Wang Wei; Venkat Katta >>> >>> *Cc:* [email protected] >>> *Subject:* Re: Error while running singa on mesos >>> >>> what version of Docker are you running? >>> >>> Anh. >>> >>> >>> On 22 June 2016 at 14:26, Wang Wei <[email protected]> wrote: >>> >>>> >>>> ---------- Forwarded message ---------- >>>> From: Venkat Katta <[email protected]> >>>> Date: Wed, Jun 22, 2016 at 1:31 PM >>>> Subject: Re: Error while running singa on mesos >>>> To: Wang Wei <[email protected]> >>>> >>>> >>>> It works fine if I replace the node0 and node2 with their IP address. I >>>> am using weave for transparent communication between the containers. In >>>> singa.conf to connect to zookeeper i used node0 but not the ipaddress of >>>> node0 it is able to connect why can't singa resolve the hostname. And while >>>> running singa with mesos it is using localhost rather ip address node1 and >>>> node2, also we are not giving any arguement while running the singa >>>> regarding ip address of the slaves. >>>> >>>> >>>> F0622 05:18:28.932391 1513 socket.cc:98] Check failed: port != -1 (-1 >>>> vs. -1) tcp://localhost:* >>>> >>>> >>>> Thanks, >>>> >>>> Venkat satish katta >>>> ------------------------------ >>>> *From:* Wang Wei <[email protected]> >>>> *Sent:* Wednesday, June 22, 2016 8:46:36 AM >>>> *To:* Venkat Katta >>>> >>>> *Subject:* Re: Error while running singa on mesos >>>> >>>> If you are using Docker (withou mesos), it could be the problem of >>>> network routing. May need to configure the Docker to setup the network then >>>> node0 and node2 can be accessed from node1. >>>> We are trying your configuration. >>>> >>>> regards, >>>> wang wei >>>> >>>> >>>> On Wed, Jun 22, 2016 at 10:32 AM, Wang Wei <[email protected]> wrote: >>>> >>>>> Hi Venkat, >>>>> >>>>> It should be the problem of the node address. >>>>> Pls replace node0 and node2 with their IP addresses. >>>>> >>>>> regards, >>>>> wei >>>>> >>>>> On Wed, Jun 22, 2016 at 2:40 AM, Venkat Katta <[email protected]> >>>>> wrote: >>>>> >>>>>> i tried running without mesos i got the same error >>>>>> >>>>>> >>>>>> root@node0:~/incubator-singa# ./bin/singa-run.sh -conf >>>>>> examples/cifar10/hybrid.conf >>>>>> Unique JOB_ID is 4 >>>>>> Record job information to >>>>>> /tmp/singa-log/job-info/job-4-20160621-183305 >>>>>> Executing @ node2 : cd /root/incubator-singa; source >>>>>> /root/incubator-singa/conf/profile; ./singa -singa_conf >>>>>> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf >>>>>> /root/incubator-singa/examples/cifar10/hybrid.conf >>>>>> Executing @ node0 : cd /root/incubator-singa; source >>>>>> /root/incubator-singa/conf/profile; ./singa -singa_conf >>>>>> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf >>>>>> /root/incubator-singa/examples/cifar10/hybrid.conf >>>>>> F0621 18:33:24.171468 725 socket.cc:98] Check failed: port != -1 >>>>>> (-1 vs. -1) tcp://node2:* >>>>>> *** Check failure stack trace: *** >>>>>> @ 0x7f10d0a6b9fd google::LogMessage::Fail() >>>>>> @ 0x7f10d0a6d89d google::LogMessage::SendToLog() >>>>>> @ 0x7f10d0a6b5ec google::LogMessage::Flush() >>>>>> @ 0x7f10d0a6e1be google::LogMessageFatal::~LogMessageFatal() >>>>>> @ 0x7f10d0e05d79 singa::Router::Bind() >>>>>> @ 0x7f10d0d7a8bc singa::Driver::Train() >>>>>> @ 0x7f10d0d7f48b singa::Driver::Train() >>>>>> @ 0x40c915 main >>>>>> @ 0x7f10c5f13f45 (unknown) >>>>>> @ 0x40cb7e (unknown) >>>>>> F0621 18:33:06.244278 1042 socket.cc:98] Check failed: port != -1 >>>>>> (-1 vs. -1) tcp://node0:* >>>>>> *** Check failure stack trace: *** >>>>>> @ 0x7f6d4516d9fd google::LogMessage::Fail() >>>>>> @ 0x7f6d4516f89d google::LogMessage::SendToLog() >>>>>> @ 0x7f6d4516d5ec google::LogMessage::Flush() >>>>>> @ 0x7f6d451701be google::LogMessageFatal::~LogMessageFatal() >>>>>> @ 0x7f6d45507d79 singa::Router::Bind() >>>>>> @ 0x7f6d4547c8bc singa::Driver::Train() >>>>>> @ 0x7f6d4548148b singa::Driver::Train() >>>>>> @ 0x40c915 main >>>>>> @ 0x7f6d3a615f45 (unknown) >>>>>> @ 0x40cb7e (unknown) >>>>>> bash: line 1: 725 Aborted (core dumped) ./singa >>>>>> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf >>>>>> /root/incubator-singa/examples/cifar10/hybrid.conf -host node2 >>>>>> bash: line 1: 1042 Aborted (core dumped) ./singa >>>>>> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf >>>>>> /root/incubator-singa/examples/cifar10/hybrid.conf -host node0 >>>>>> E0621 18:33:07.467438 1067 job_manager.cc:156] job 4 not exists >>>>>> >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Wang Wei <[email protected]> >>>>>> *Sent:* Tuesday, June 21, 2016 7:09:46 PM >>>>>> *To:* Venkat Katta >>>>>> *Cc:* [email protected] >>>>>> *Subject:* Re: Error while running singa on mesos >>>>>> >>>>>> Hi, >>>>>> >>>>>> Can you try to run it without Mesos? >>>>>> 1. Compile singa with enable-dist >>>>>> 2. change conf/singa.conf to set the zookeeper host >>>>>> 3. update the conf/hostfile one line per machine >>>>>> 4. update the conf/profile to export LD_LIBRARY_PATH >>>>>> >>>>>> regards, >>>>>> Wei >>>>>> >>>>>> On Tue, Jun 21, 2016 at 8:52 PM, Venkat Katta <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> I am actually trying to run singa on mesos in fully distributed >>>>>>> architecture. I built the docker images as given in the documentation. >>>>>>> I am >>>>>>> using mesos 0.28.2 and singa 0.3-rc3.I am running each docker container >>>>>>> using --net=host flag so that they take the ip of the system. Singa >>>>>>> works >>>>>>> as long as the workers are all in one machine . >>>>>>> When I try to use two machines for training it shows error >>>>>>> >>>>>>> >>>>>>> F0617 10:00:43.862246 2742 socket.cc:98] Check failed: port != -1 >>>>>>> (-1 vs. -1) tcp://localhost:* >>>>>>> >>>>>>> >>>>>>> so while running the scheduler do we need to give it hostfile >>>>>>> containing all the hosts. How does it know the remaining hosts in >>>>>>> cluster. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> >>>>>>> Venkat Satish Katta. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >
