[ 
https://issues.apache.org/jira/browse/SINGA-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558890#comment-14558890
 ] 

wangwei commented on SINGA-2:
-----------------------------

This should be a problem with ZeroMQ which restricts the order of binding and 
connect for inproc communication.

http://api.zeromq.org/4-0:zmq-connect says that "The first exception is when 
using the inproc:// transport: you must call zmq_bind() before calling 
zmq_connect()."

I have updated the code and will commit after doing more test.

> Check failed: zsock_connect
> ---------------------------
>
>                 Key: SINGA-2
>                 URL: https://issues.apache.org/jira/browse/SINGA-2
>             Project: Singa
>          Issue Type: Bug
>         Environment: Ubuntu 12.04
> gcc 4.9.2
>            Reporter: Sheng Wang
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> When running the cifar10 example:
> $./singa -model=examples/cifar10/model.conf 
> -cluster=examples/cifar10/cluster.conf
> I will sometimes get following error:
> F0521 16:20:56.078801 39822 socket.cc:29] Check failed: 
> zsock_connect(dealer_,"%s", endpoint.c_str()) == 0 (-1 vs. 0) 
> F0521 16:20:56.079180 39821 socket.cc:29] Check failed: 
> zsock_connect(dealer_,"%s", endpoint.c_str()) == 0 (-1 vs. 0) 
> *** Check failure stack trace: ***
> *** Check failure stack trace: ***
>     @     0x7f0e8f9abb7d  google::LogMessage::Fail()
>     @     0x7f0e8f9adc7f  google::LogMessage::SendToLog()
>     @     0x7f0e8f9ab76c  google::LogMessage::Flush()
>     @     0x7f0e8f9ae51d  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f0e8f9abb7d  google::LogMessage::Fail()
>     @     0x7f0e8f9adc7f  google::LogMessage::SendToLog()
>     @     0x7f0e8f9ab76c  google::LogMessage::Flush()
>     @     0x7f0e8f9ae51d  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f0e8ffc3626  singa::Dealer::Connect()
>     @     0x7f0e8ff71d16  singa::Server::Run()
>     @     0x7f0e8f6db490  (unknown)
>     @     0x7f0e8ee38e9a  start_thread
>     @     0x7f0e8f14138d  (unknown)
> Aborted (core dumped)
> Rerun or recompile could solve this problem.
> It is hard to track the error, so need further investigation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to