I can't recall anything that has changed in the parallel codebase recently.
You could try with the 0.2 version just to be sure. Maybe the the julia
processes on the cluster are dying soon after they launch, and hence the
closed connection while reading port information? Could you try launching
julia manually on any of the nodes of the cluster, just to ensure that the
julia setup on those nodes are OK?


On Wed, Jan 22, 2014 at 5:07 AM, David Bindel <[email protected]>wrote:

>
>
> I wrote a cluster manager for launching jobs on 
> HTCondor<http://github.com/dbindel/ClusterManagers.jl>a little while back, 
> and was having good luck with it, but now I seem to be
> having some trouble.  The basic logic is that Julia starts a TCP server and
> launches jobs on the cluster that then connect to the server and send back
> their information (by piping through telnet).  The problem is that
> somewhere between when the connection is accepted and when Julia tries to
> read the port information, "the connection is closed by the foreign host".
>
> I've rebuilt Julia between when I last tested this and now, so it's
> possible that there was some change in Julia; it's also possible that there
> was a change in the cluster configuration, since that's equally a moving
> target.  But I'm a bit foxed, and any insights would be welcome.
>
> Cheers,
> David
>

Reply via email to