D'oh!  Illegal instruction.  Somehow it happens when I submit via condor 
but not when I ssh into the nodes directly.  Thanks!

On Tuesday, January 21, 2014 10:36:50 PM UTC-5, Amit Murthy wrote:
>
> I can't recall anything that has changed in the parallel codebase 
> recently. You could try with the 0.2 version just to be sure. Maybe the the 
> julia processes on the cluster are dying soon after they launch, and hence 
> the closed connection while reading port information? Could you try 
> launching julia manually on any of the nodes of the cluster, just to ensure 
> that the julia setup on those nodes are OK?
>
>
> On Wed, Jan 22, 2014 at 5:07 AM, David Bindel 
> <[email protected]<javascript:>
> > wrote:
>
>>
>>
>> I wrote a cluster manager for launching jobs on 
>> HTCondor<http://github.com/dbindel/ClusterManagers.jl>a little while back, 
>> and was having good luck with it, but now I seem to be 
>> having some trouble.  The basic logic is that Julia starts a TCP server and 
>> launches jobs on the cluster that then connect to the server and send back 
>> their information (by piping through telnet).  The problem is that 
>> somewhere between when the connection is accepted and when Julia tries to 
>> read the port information, "the connection is closed by the foreign host".
>>
>> I've rebuilt Julia between when I last tested this and now, so it's 
>> possible that there was some change in Julia; it's also possible that there 
>> was a change in the cluster configuration, since that's equally a moving 
>> target.  But I'm a bit foxed, and any insights would be welcome.
>>
>> Cheers,
>> David
>>
>
>

Reply via email to