I can't get a job to work for more than a few hours when it fails with: SSHConnectionLibSSH::isConnected(): server timeout.
SSH error: Failed to resolve hostname legion.rc.ucl.ac.uk (Name or service not known) "Cannot connect to ssh server [email protected]:22" Warning: "Cannot connect to ssh server" SSHConnectionLibSSH::isConnected(): server timeout. SSH error: Failed to resolve hostname legion.rc.ucl.ac.uk (Name or service not known) "Cannot connect to ssh server [email protected]:22" Warning: "Cannot connect to ssh server" Meanwhile we had ping running in another window and it showed no errors and loss of network or nameservice. I think the host just didn't respond to the ssh call immediately and the call timed out and xtalopt then dies. I know how to fix this but have to find time. Ron --- Ronald Cohen Geophysical Laboratory Carnegie Institution 5251 Broad Branch Rd., N.W. Washington, D.C. 20015 [email protected] office: 202-478-8937 skype: ronaldcohen https://twitter.com/recohen3 https://www.linkedin.com/profile/view?id=163327727 On Sun, Aug 16, 2015 at 4:35 PM, Patrick Avery <[email protected]> wrote: > Hey Ron, > > So, we have been making several updates for a new release that is coming out > soon. We MIGHT have already fixed this issue (although I don't recall > explicitly fixing it). But I ran a test today to see what would happen. Let > me know if you think this test adequately mimics your glitch that you found: > > I submitted a couple of jobs with XtalOpt, then disconnected my wifi for > about 20 seconds (so the connection to the remote cluster would fail). Then, > I reconnected it, and it read the output from the runs and updated > successfully - no job restarts. > > I tried it again for a longer period of time (I disconnected the wifi for > about 3 minutes). After several server timeouts (and it mentioned "Warning: > "Cannot connect to ssh server"" three times in that time period), I > reconnected the wifi. Unfortunately, the run did not continue - it appeared > to be frozen (something we may want to fix). But after exiting out and > resuming the run, it took it a while, but it updated the structures > successfully from the output - no job restarts. > > Thanks, > Patrick > > On Fri, Aug 14, 2015 at 4:35 PM, Cohen, Ronald <[email protected]> > wrote: >> >> I had fixed this in an earlier version but don't remember how. >> Sometimes the connection to the server or nameserver goes down (about >> once a day) and I see an error like: >> >> SSHConnectionLibSSH::isConnected(): server timeout. >> SSH error: Failed to resolve hostname legion.rc.ucl.ac.uk (Name or >> service not known) >> "Cannot connect to ssh server [email protected]:22" >> Warning: "Cannot connect to ssh server" >> >> However, jobs are still running on the server and it comes back, but >> in the meantime xtalopt hangs and never recovers without a restart, >> and loss of the running jobs. It should just wait until the server >> connection comes back. >> >> Ron >> >> --- >> Ronald Cohen >> Geophysical Laboratory >> Carnegie Institution >> 5251 Broad Branch Rd., N.W. >> Washington, D.C. 20015 >> [email protected] >> office: 202-478-8937 >> skype: ronaldcohen >> https://twitter.com/recohen3 >> https://www.linkedin.com/profile/view?id=163327727 > > ------------------------------------------------------------------------------ _______________________________________________ Avogadro-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/avogadro-devel
