D'oh! Illegal instruction. Somehow it happens when I submit via condor but not when I ssh into the nodes directly. Thanks!
On Tuesday, January 21, 2014 10:36:50 PM UTC-5, Amit Murthy wrote: > > I can't recall anything that has changed in the parallel codebase > recently. You could try with the 0.2 version just to be sure. Maybe the the > julia processes on the cluster are dying soon after they launch, and hence > the closed connection while reading port information? Could you try > launching julia manually on any of the nodes of the cluster, just to ensure > that the julia setup on those nodes are OK? > > > On Wed, Jan 22, 2014 at 5:07 AM, David Bindel > <[email protected]<javascript:> > > wrote: > >> >> >> I wrote a cluster manager for launching jobs on >> HTCondor<http://github.com/dbindel/ClusterManagers.jl>a little while back, >> and was having good luck with it, but now I seem to be >> having some trouble. The basic logic is that Julia starts a TCP server and >> launches jobs on the cluster that then connect to the server and send back >> their information (by piping through telnet). The problem is that >> somewhere between when the connection is accepted and when Julia tries to >> read the port information, "the connection is closed by the foreign host". >> >> I've rebuilt Julia between when I last tested this and now, so it's >> possible that there was some change in Julia; it's also possible that there >> was a change in the cluster configuration, since that's equally a moving >> target. But I'm a bit foxed, and any insights would be welcome. >> >> Cheers, >> David >> > >
