Please check the new pull request:
https://github.com/apache/incubator-singa/pull/182

See you if you can launch with Mesos. I tested with 3 machines and it
works.

Anh.




On 22 June 2016 at 15:54, Venkat Katta <[email protected]> wrote:

> Yes i can ping node0 from node1. I am also able to connect to zookeeper
> and mesos master on node0 from node1.
>
>
> Thanks,
>
>
> Venkat Satish Katta
> ------------------------------
> *From:* Anh Dinh <[email protected]>
> *Sent:* Wednesday, June 22, 2016 1:20:43 PM
>
> *To:* Venkat Katta
> *Cc:* Wang Wei; [email protected]
> *Subject:* Re: Error while running singa on mesos
>
> let's say you create a container node0 in machine A, and node1 in machine
> B.
>
> In node1, can you ping node0?
>
> If you cannot, then Weaver wasn't running properly (with Docker v1.8.3).
>
> Anh.
>
>
> On 22 June 2016 at 15:42, Venkat Katta <[email protected]> wrote:
>
>> As the docker containers are in different machines i can no longer make
>> communications between docker containers as they ip's are internal to the
>> machine. so i am using weaver which is written in documentation
>> https://singa.incubator.apache.org/docs/docker.html#launch_distributed .
>> It is trying to bind zsock on localhost not on node1 or node2.
>>
>> Regards,
>> Venkat Satish Katta
>> ------------------------------
>> *From:* Anh Dinh <[email protected]>
>> *Sent:* Wednesday, June 22, 2016 12:23:26 PM
>> *To:* Venkat Katta
>> *Cc:* Wang Wei; [email protected]
>>
>> *Subject:* Re: Error while running singa on mesos
>>
>> We had problems with Docker version >= 1.9 (yours is even newer), as
>> noted in
>> https://singa.incubator.apache.org/docs/docker.html#launch_pseudo
>>
>> Basically new versions of Docker changed the DNS resolution mechanism:
>> the Docker daemon no longer updates the /etc/hosts file of existing
>> containers when new one is launched.
>>
>> One suggestion is to downgrade Docker to 1.8:
>>
>> sudo apt-get install docker-engine=1.8.3-0~trusty
>>
>> Another option is to enter IP addresses manually into /etc/hosts files.
>> But we have not tried it with Weaver, so there's high chance that it won't
>> work with Weaver.
>>
>>
>> On 22 June 2016 at 14:39, Venkat Katta <[email protected]> wrote:
>>
>>> docker version : 1.11.2
>>>
>>> regards,
>>> venkat satish katta
>>> ------------------------------
>>> *From:* Anh Dinh <[email protected]>
>>> *Sent:* Wednesday, June 22, 2016 12:04:56 PM
>>> *To:* Wang Wei; Venkat Katta
>>>
>>> *Cc:* [email protected]
>>> *Subject:* Re: Error while running singa on mesos
>>>
>>> what version of Docker are you running?
>>>
>>> Anh.
>>>
>>>
>>> On 22 June 2016 at 14:26, Wang Wei <[email protected]> wrote:
>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Venkat Katta <[email protected]>
>>>> Date: Wed, Jun 22, 2016 at 1:31 PM
>>>> Subject: Re: Error while running singa on mesos
>>>> To: Wang Wei <[email protected]>
>>>>
>>>>
>>>> It works fine if I replace the node0 and node2 with their IP address. I
>>>> am using weave for transparent communication between the containers.  In
>>>> singa.conf to connect to zookeeper i used node0 but not the ipaddress of
>>>> node0 it is able to connect why can't singa resolve the hostname. And while
>>>> running singa with mesos it is using localhost rather ip address node1 and
>>>> node2, also we are not giving any arguement while running the singa
>>>>  regarding ip address of the slaves.
>>>>
>>>>
>>>> F0622 05:18:28.932391  1513 socket.cc:98] Check failed: port != -1 (-1
>>>> vs. -1) tcp://localhost:*
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Venkat satish katta
>>>> ------------------------------
>>>> *From:* Wang Wei <[email protected]>
>>>> *Sent:* Wednesday, June 22, 2016 8:46:36 AM
>>>> *To:* Venkat Katta
>>>>
>>>> *Subject:* Re: Error while running singa on mesos
>>>>
>>>> If you are using Docker (withou mesos), it could be the problem of
>>>> network routing. May need to configure the Docker to setup the network then
>>>> node0 and node2 can be accessed from node1.
>>>> We are trying your configuration.
>>>>
>>>> regards,
>>>> wang wei
>>>>
>>>>
>>>> On Wed, Jun 22, 2016 at 10:32 AM, Wang Wei <[email protected]> wrote:
>>>>
>>>>> Hi Venkat,
>>>>>
>>>>> It should be the problem of the node address.
>>>>> Pls replace node0 and node2 with their IP addresses.
>>>>>
>>>>> regards,
>>>>> wei
>>>>>
>>>>> On Wed, Jun 22, 2016 at 2:40 AM, Venkat Katta <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> i tried running without mesos i got the same error
>>>>>>
>>>>>>
>>>>>> root@node0:~/incubator-singa# ./bin/singa-run.sh -conf
>>>>>> examples/cifar10/hybrid.conf
>>>>>> Unique JOB_ID is 4
>>>>>> Record job information to
>>>>>> /tmp/singa-log/job-info/job-4-20160621-183305
>>>>>> Executing @ node2 : cd /root/incubator-singa; source
>>>>>> /root/incubator-singa/conf/profile; ./singa -singa_conf
>>>>>> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
>>>>>> /root/incubator-singa/examples/cifar10/hybrid.conf
>>>>>> Executing @ node0 : cd /root/incubator-singa; source
>>>>>> /root/incubator-singa/conf/profile; ./singa -singa_conf
>>>>>> /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
>>>>>> /root/incubator-singa/examples/cifar10/hybrid.conf
>>>>>> F0621 18:33:24.171468   725 socket.cc:98] Check failed: port != -1
>>>>>> (-1 vs. -1) tcp://node2:*
>>>>>> *** Check failure stack trace: ***
>>>>>>     @     0x7f10d0a6b9fd  google::LogMessage::Fail()
>>>>>>     @     0x7f10d0a6d89d  google::LogMessage::SendToLog()
>>>>>>     @     0x7f10d0a6b5ec  google::LogMessage::Flush()
>>>>>>     @     0x7f10d0a6e1be  google::LogMessageFatal::~LogMessageFatal()
>>>>>>     @     0x7f10d0e05d79  singa::Router::Bind()
>>>>>>     @     0x7f10d0d7a8bc  singa::Driver::Train()
>>>>>>     @     0x7f10d0d7f48b  singa::Driver::Train()
>>>>>>     @           0x40c915  main
>>>>>>     @     0x7f10c5f13f45  (unknown)
>>>>>>     @           0x40cb7e  (unknown)
>>>>>> F0621 18:33:06.244278  1042 socket.cc:98] Check failed: port != -1
>>>>>> (-1 vs. -1) tcp://node0:*
>>>>>> *** Check failure stack trace: ***
>>>>>>     @     0x7f6d4516d9fd  google::LogMessage::Fail()
>>>>>>     @     0x7f6d4516f89d  google::LogMessage::SendToLog()
>>>>>>     @     0x7f6d4516d5ec  google::LogMessage::Flush()
>>>>>>     @     0x7f6d451701be  google::LogMessageFatal::~LogMessageFatal()
>>>>>>     @     0x7f6d45507d79  singa::Router::Bind()
>>>>>>     @     0x7f6d4547c8bc  singa::Driver::Train()
>>>>>>     @     0x7f6d4548148b  singa::Driver::Train()
>>>>>>     @           0x40c915  main
>>>>>>     @     0x7f6d3a615f45  (unknown)
>>>>>>     @           0x40cb7e  (unknown)
>>>>>> bash: line 1:   725 Aborted                 (core dumped) ./singa
>>>>>> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
>>>>>> /root/incubator-singa/examples/cifar10/hybrid.conf -host node2
>>>>>> bash: line 1:  1042 Aborted                 (core dumped) ./singa
>>>>>> -singa_conf /root/incubator-singa/conf/singa.conf -singa_job 4 -conf
>>>>>> /root/incubator-singa/examples/cifar10/hybrid.conf -host node0
>>>>>> E0621 18:33:07.467438  1067 job_manager.cc:156] job 4 not exists
>>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Wang Wei <[email protected]>
>>>>>> *Sent:* Tuesday, June 21, 2016 7:09:46 PM
>>>>>> *To:* Venkat Katta
>>>>>> *Cc:* [email protected]
>>>>>> *Subject:* Re: Error while running singa on mesos
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Can you try to run it without Mesos?
>>>>>> 1. Compile singa with enable-dist
>>>>>> 2. change conf/singa.conf to set the zookeeper host
>>>>>> 3. update the conf/hostfile one line per machine
>>>>>> 4. update the conf/profile to export LD_LIBRARY_PATH
>>>>>>
>>>>>> regards,
>>>>>> Wei
>>>>>>
>>>>>> On Tue, Jun 21, 2016 at 8:52 PM, Venkat Katta <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>> I am actually trying to run singa on mesos in fully distributed
>>>>>>> architecture. I built the docker images as given in the documentation. 
>>>>>>> I am
>>>>>>> using mesos 0.28.2 and singa 0.3-rc3.I am running each docker container
>>>>>>> using --net=host flag so that they take the ip of the system. Singa 
>>>>>>> works
>>>>>>> as long as the workers are all in one machine .
>>>>>>> When I try to use two machines for training it shows error
>>>>>>>
>>>>>>>
>>>>>>> F0617 10:00:43.862246 2742 socket.cc:98] Check failed: port != -1
>>>>>>> (-1 vs. -1) tcp://localhost:*
>>>>>>>
>>>>>>>
>>>>>>>   so while running the scheduler do we need to give it hostfile
>>>>>>> containing all the hosts. How does it know the remaining hosts in 
>>>>>>> cluster.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>>
>>>>>>> Venkat Satish Katta.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to