Hi Venkat,

It should be the problem of the node address.
Pls replace node0 and node2 with their IP addresses.

regards,
wei

On Wed, Jun 22, 2016 at 2:40 AM, Venkat Katta <[email protected]> wrote:

> i tried running without mesos i got the same error
>
>
> root@node0:~/incubator-singa# ./bin/singa-run.sh -conf
> examples/cifar10/hybrid.conf
> Unique JOB_ID is 4
> Record job information to /tmp/singa-log/job-info/job-4-20160621-183305
> Executing @ node2 : cd /root/incubator-singa; source
> /root/incubator-singa/conf/profile; ./singa -singa_conf
> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
> /root/incubator-singa/examples/cifar10/hybrid.conf
> Executing @ node0 : cd /root/incubator-singa; source
> /root/incubator-singa/conf/profile; ./singa -singa_conf
> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
> /root/incubator-singa/examples/cifar10/hybrid.conf
> F0621 18:33:24.171468   725 socket.cc:98] Check failed: port != -1 (-1 vs.
> -1) tcp://node2:*
> *** Check failure stack trace: ***
>     @     0x7f10d0a6b9fd  google::LogMessage::Fail()
>     @     0x7f10d0a6d89d  google::LogMessage::SendToLog()
>     @     0x7f10d0a6b5ec  google::LogMessage::Flush()
>     @     0x7f10d0a6e1be  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f10d0e05d79  singa::Router::Bind()
>     @     0x7f10d0d7a8bc  singa::Driver::Train()
>     @     0x7f10d0d7f48b  singa::Driver::Train()
>     @           0x40c915  main
>     @     0x7f10c5f13f45  (unknown)
>     @           0x40cb7e  (unknown)
> F0621 18:33:06.244278  1042 socket.cc:98] Check failed: port != -1 (-1 vs.
> -1) tcp://node0:*
> *** Check failure stack trace: ***
>     @     0x7f6d4516d9fd  google::LogMessage::Fail()
>     @     0x7f6d4516f89d  google::LogMessage::SendToLog()
>     @     0x7f6d4516d5ec  google::LogMessage::Flush()
>     @     0x7f6d451701be  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f6d45507d79  singa::Router::Bind()
>     @     0x7f6d4547c8bc  singa::Driver::Train()
>     @     0x7f6d4548148b  singa::Driver::Train()
>     @           0x40c915  main
>     @     0x7f6d3a615f45  (unknown)
>     @           0x40cb7e  (unknown)
> bash: line 1:   725 Aborted                 (core dumped) ./singa
> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
> /root/incubator-singa/examples/cifar10/hybrid.conf -host node2
> bash: line 1:  1042 Aborted                 (core dumped) ./singa
> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
> /root/incubator-singa/examples/cifar10/hybrid.conf -host node0
> E0621 18:33:07.467438  1067 job_manager.cc:156] job 4 not exists
>
>
> ------------------------------
> *From:* Wang Wei <[email protected]>
> *Sent:* Tuesday, June 21, 2016 7:09:46 PM
> *To:* Venkat Katta
> *Cc:* [email protected]
> *Subject:* Re: Error while running singa on mesos
>
> Hi,
>
> Can you try to run it without Mesos?
> 1. Compile singa with enable-dist
> 2. change conf/singa.conf to set the zookeeper host
> 3. update the conf/hostfile one line per machine
> 4. update the conf/profile to export LD_LIBRARY_PATH
>
> regards,
> Wei
>
> On Tue, Jun 21, 2016 at 8:52 PM, Venkat Katta <[email protected]> wrote:
>
>> Hi,
>>
>>
>> I am actually trying to run singa on mesos in fully distributed
>> architecture. I built the docker images as given in the documentation. I am
>> using mesos 0.28.2 and singa 0.3-rc3.I am running each docker container
>> using --net=host flag so that they take the ip of the system. Singa works
>> as long as the workers are all in one machine .
>> When I try to use two machines for training it shows error
>>
>>
>> F0617 10:00:43.862246 2742 socket.cc:98] Check failed: port != -1 (-1 vs.
>> -1) tcp://localhost:*
>>
>>
>>   so while running the scheduler do we need to give it hostfile
>> containing all the hosts. How does it know the remaining hosts in cluster.
>>
>>
>> Thanks,
>>
>>
>> Venkat Satish Katta.
>>
>
>

Reply via email to