I can't get a job to work for more than a few hours when it fails with:

SSHConnectionLibSSH::isConnected(): server timeout.

SSH error:  Failed to resolve hostname legion.rc.ucl.ac.uk (Name or
service not known)

"Cannot connect to ssh server [email protected]:22"

Warning:  "Cannot connect to ssh server"

SSHConnectionLibSSH::isConnected(): server timeout.

SSH error:  Failed to resolve hostname legion.rc.ucl.ac.uk (Name or
service not known)

"Cannot connect to ssh server [email protected]:22"

Warning:  "Cannot connect to ssh server"

Meanwhile we had ping running in another window and it showed no
errors and loss of network or nameservice.

I think the host just didn't respond to the ssh call immediately and
the call timed out and xtalopt then dies.
I know how to fix this but have to find time.

Ron

---
Ronald Cohen
Geophysical Laboratory
Carnegie Institution
5251 Broad Branch Rd., N.W.
Washington, D.C. 20015
[email protected]
office: 202-478-8937
skype: ronaldcohen
https://twitter.com/recohen3
https://www.linkedin.com/profile/view?id=163327727


On Sun, Aug 16, 2015 at 4:35 PM, Patrick Avery <[email protected]> wrote:
> Hey Ron,
>
> So, we have been making several updates for a new release that is coming out
> soon. We MIGHT have already fixed this issue (although I don't recall
> explicitly fixing it). But I ran a test today to see what would happen. Let
> me know if you think this test adequately mimics your glitch that you found:
>
> I submitted a couple of jobs with XtalOpt, then disconnected my wifi for
> about 20 seconds (so the connection to the remote cluster would fail). Then,
> I reconnected it, and it read the output from the runs and updated
> successfully - no job restarts.
>
> I tried it again for a longer period of time (I disconnected the wifi for
> about 3 minutes). After several server timeouts (and it mentioned "Warning:
> "Cannot connect to ssh server"" three times in that time period), I
> reconnected the wifi. Unfortunately, the run did not continue - it appeared
> to be frozen (something we may want to fix). But after exiting out and
> resuming the run, it took it a while, but it updated the structures
> successfully from the output - no job restarts.
>
> Thanks,
> Patrick
>
> On Fri, Aug 14, 2015 at 4:35 PM, Cohen, Ronald <[email protected]>
> wrote:
>>
>> I had fixed this in an earlier version but don't remember how.
>> Sometimes the connection to the server or nameserver goes down (about
>> once a day) and I see an error like:
>>
>> SSHConnectionLibSSH::isConnected(): server timeout.
>> SSH error:  Failed to resolve hostname legion.rc.ucl.ac.uk (Name or
>> service not known)
>> "Cannot connect to ssh server [email protected]:22"
>> Warning:  "Cannot connect to ssh server"
>>
>> However, jobs are still running on the server and it comes back, but
>> in the meantime xtalopt hangs and never recovers without a restart,
>> and loss of the running jobs. It should just wait until the server
>> connection comes back.
>>
>> Ron
>>
>> ---
>> Ronald Cohen
>> Geophysical Laboratory
>> Carnegie Institution
>> 5251 Broad Branch Rd., N.W.
>> Washington, D.C. 20015
>> [email protected]
>> office: 202-478-8937
>> skype: ronaldcohen
>> https://twitter.com/recohen3
>> https://www.linkedin.com/profile/view?id=163327727
>
>

------------------------------------------------------------------------------
_______________________________________________
Avogadro-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/avogadro-devel

Reply via email to