Hi Venkat, It should be the problem of the node address. Pls replace node0 and node2 with their IP addresses.
regards, wei On Wed, Jun 22, 2016 at 2:40 AM, Venkat Katta <[email protected]> wrote: > i tried running without mesos i got the same error > > > root@node0:~/incubator-singa# ./bin/singa-run.sh -conf > examples/cifar10/hybrid.conf > Unique JOB_ID is 4 > Record job information to /tmp/singa-log/job-info/job-4-20160621-183305 > Executing @ node2 : cd /root/incubator-singa; source > /root/incubator-singa/conf/profile; ./singa -singa_conf > /root/incubator-singa/conf/singa.conf -singa_job 4 -conf > /root/incubator-singa/examples/cifar10/hybrid.conf > Executing @ node0 : cd /root/incubator-singa; source > /root/incubator-singa/conf/profile; ./singa -singa_conf > /root/incubator-singa/conf/singa.conf -singa_job 4 -conf > /root/incubator-singa/examples/cifar10/hybrid.conf > F0621 18:33:24.171468 725 socket.cc:98] Check failed: port != -1 (-1 vs. > -1) tcp://node2:* > *** Check failure stack trace: *** > @ 0x7f10d0a6b9fd google::LogMessage::Fail() > @ 0x7f10d0a6d89d google::LogMessage::SendToLog() > @ 0x7f10d0a6b5ec google::LogMessage::Flush() > @ 0x7f10d0a6e1be google::LogMessageFatal::~LogMessageFatal() > @ 0x7f10d0e05d79 singa::Router::Bind() > @ 0x7f10d0d7a8bc singa::Driver::Train() > @ 0x7f10d0d7f48b singa::Driver::Train() > @ 0x40c915 main > @ 0x7f10c5f13f45 (unknown) > @ 0x40cb7e (unknown) > F0621 18:33:06.244278 1042 socket.cc:98] Check failed: port != -1 (-1 vs. > -1) tcp://node0:* > *** Check failure stack trace: *** > @ 0x7f6d4516d9fd google::LogMessage::Fail() > @ 0x7f6d4516f89d google::LogMessage::SendToLog() > @ 0x7f6d4516d5ec google::LogMessage::Flush() > @ 0x7f6d451701be google::LogMessageFatal::~LogMessageFatal() > @ 0x7f6d45507d79 singa::Router::Bind() > @ 0x7f6d4547c8bc singa::Driver::Train() > @ 0x7f6d4548148b singa::Driver::Train() > @ 0x40c915 main > @ 0x7f6d3a615f45 (unknown) > @ 0x40cb7e (unknown) > bash: line 1: 725 Aborted (core dumped) ./singa > -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf > /root/incubator-singa/examples/cifar10/hybrid.conf -host node2 > bash: line 1: 1042 Aborted (core dumped) ./singa > -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf > /root/incubator-singa/examples/cifar10/hybrid.conf -host node0 > E0621 18:33:07.467438 1067 job_manager.cc:156] job 4 not exists > > > ------------------------------ > *From:* Wang Wei <[email protected]> > *Sent:* Tuesday, June 21, 2016 7:09:46 PM > *To:* Venkat Katta > *Cc:* [email protected] > *Subject:* Re: Error while running singa on mesos > > Hi, > > Can you try to run it without Mesos? > 1. Compile singa with enable-dist > 2. change conf/singa.conf to set the zookeeper host > 3. update the conf/hostfile one line per machine > 4. update the conf/profile to export LD_LIBRARY_PATH > > regards, > Wei > > On Tue, Jun 21, 2016 at 8:52 PM, Venkat Katta <[email protected]> wrote: > >> Hi, >> >> >> I am actually trying to run singa on mesos in fully distributed >> architecture. I built the docker images as given in the documentation. I am >> using mesos 0.28.2 and singa 0.3-rc3.I am running each docker container >> using --net=host flag so that they take the ip of the system. Singa works >> as long as the workers are all in one machine . >> When I try to use two machines for training it shows error >> >> >> F0617 10:00:43.862246 2742 socket.cc:98] Check failed: port != -1 (-1 vs. >> -1) tcp://localhost:* >> >> >> so while running the scheduler do we need to give it hostfile >> containing all the hosts. How does it know the remaining hosts in cluster. >> >> >> Thanks, >> >> >> Venkat Satish Katta. >> > >
