what version of Docker are you running? Anh.
On 22 June 2016 at 14:26, Wang Wei <[email protected]> wrote: > > ---------- Forwarded message ---------- > From: Venkat Katta <[email protected]> > Date: Wed, Jun 22, 2016 at 1:31 PM > Subject: Re: Error while running singa on mesos > To: Wang Wei <[email protected]> > > > It works fine if I replace the node0 and node2 with their IP address. I am > using weave for transparent communication between the containers. In > singa.conf to connect to zookeeper i used node0 but not the ipaddress of > node0 it is able to connect why can't singa resolve the hostname. And while > running singa with mesos it is using localhost rather ip address node1 and > node2, also we are not giving any arguement while running the singa > regarding ip address of the slaves. > > > F0622 05:18:28.932391 1513 socket.cc:98] Check failed: port != -1 (-1 vs. > -1) tcp://localhost:* > > > Thanks, > > Venkat satish katta > ------------------------------ > *From:* Wang Wei <[email protected]> > *Sent:* Wednesday, June 22, 2016 8:46:36 AM > *To:* Venkat Katta > > *Subject:* Re: Error while running singa on mesos > > If you are using Docker (withou mesos), it could be the problem of network > routing. May need to configure the Docker to setup the network then node0 > and node2 can be accessed from node1. > We are trying your configuration. > > regards, > wang wei > > > On Wed, Jun 22, 2016 at 10:32 AM, Wang Wei <[email protected]> wrote: > >> Hi Venkat, >> >> It should be the problem of the node address. >> Pls replace node0 and node2 with their IP addresses. >> >> regards, >> wei >> >> On Wed, Jun 22, 2016 at 2:40 AM, Venkat Katta <[email protected]> wrote: >> >>> i tried running without mesos i got the same error >>> >>> >>> root@node0:~/incubator-singa# ./bin/singa-run.sh -conf >>> examples/cifar10/hybrid.conf >>> Unique JOB_ID is 4 >>> Record job information to /tmp/singa-log/job-info/job-4-20160621-183305 >>> Executing @ node2 : cd /root/incubator-singa; source >>> /root/incubator-singa/conf/profile; ./singa -singa_conf >>> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf >>> /root/incubator-singa/examples/cifar10/hybrid.conf >>> Executing @ node0 : cd /root/incubator-singa; source >>> /root/incubator-singa/conf/profile; ./singa -singa_conf >>> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf >>> /root/incubator-singa/examples/cifar10/hybrid.conf >>> F0621 18:33:24.171468 725 socket.cc:98] Check failed: port != -1 (-1 >>> vs. -1) tcp://node2:* >>> *** Check failure stack trace: *** >>> @ 0x7f10d0a6b9fd google::LogMessage::Fail() >>> @ 0x7f10d0a6d89d google::LogMessage::SendToLog() >>> @ 0x7f10d0a6b5ec google::LogMessage::Flush() >>> @ 0x7f10d0a6e1be google::LogMessageFatal::~LogMessageFatal() >>> @ 0x7f10d0e05d79 singa::Router::Bind() >>> @ 0x7f10d0d7a8bc singa::Driver::Train() >>> @ 0x7f10d0d7f48b singa::Driver::Train() >>> @ 0x40c915 main >>> @ 0x7f10c5f13f45 (unknown) >>> @ 0x40cb7e (unknown) >>> F0621 18:33:06.244278 1042 socket.cc:98] Check failed: port != -1 (-1 >>> vs. -1) tcp://node0:* >>> *** Check failure stack trace: *** >>> @ 0x7f6d4516d9fd google::LogMessage::Fail() >>> @ 0x7f6d4516f89d google::LogMessage::SendToLog() >>> @ 0x7f6d4516d5ec google::LogMessage::Flush() >>> @ 0x7f6d451701be google::LogMessageFatal::~LogMessageFatal() >>> @ 0x7f6d45507d79 singa::Router::Bind() >>> @ 0x7f6d4547c8bc singa::Driver::Train() >>> @ 0x7f6d4548148b singa::Driver::Train() >>> @ 0x40c915 main >>> @ 0x7f6d3a615f45 (unknown) >>> @ 0x40cb7e (unknown) >>> bash: line 1: 725 Aborted (core dumped) ./singa >>> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf >>> /root/incubator-singa/examples/cifar10/hybrid.conf -host node2 >>> bash: line 1: 1042 Aborted (core dumped) ./singa >>> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf >>> /root/incubator-singa/examples/cifar10/hybrid.conf -host node0 >>> E0621 18:33:07.467438 1067 job_manager.cc:156] job 4 not exists >>> >>> >>> ------------------------------ >>> *From:* Wang Wei <[email protected]> >>> *Sent:* Tuesday, June 21, 2016 7:09:46 PM >>> *To:* Venkat Katta >>> *Cc:* [email protected] >>> *Subject:* Re: Error while running singa on mesos >>> >>> Hi, >>> >>> Can you try to run it without Mesos? >>> 1. Compile singa with enable-dist >>> 2. change conf/singa.conf to set the zookeeper host >>> 3. update the conf/hostfile one line per machine >>> 4. update the conf/profile to export LD_LIBRARY_PATH >>> >>> regards, >>> Wei >>> >>> On Tue, Jun 21, 2016 at 8:52 PM, Venkat Katta <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> >>>> I am actually trying to run singa on mesos in fully distributed >>>> architecture. I built the docker images as given in the documentation. I am >>>> using mesos 0.28.2 and singa 0.3-rc3.I am running each docker container >>>> using --net=host flag so that they take the ip of the system. Singa works >>>> as long as the workers are all in one machine . >>>> When I try to use two machines for training it shows error >>>> >>>> >>>> F0617 10:00:43.862246 2742 socket.cc:98] Check failed: port != -1 (-1 >>>> vs. -1) tcp://localhost:* >>>> >>>> >>>> so while running the scheduler do we need to give it hostfile >>>> containing all the hosts. How does it know the remaining hosts in cluster. >>>> >>>> >>>> Thanks, >>>> >>>> >>>> Venkat Satish Katta. >>>> >>> >>> >> > >
