Hi Jim, Thanks for the reply. Unfortunately the answer doesn't seem to be that simple - I do have the ssh stuff worked out (believe me, I've googled the heck out of this thing!), the qsub test won't work without it. I can scp between the two nodes in all combinations of user "globus" or "labkey", logged into either node, and in either direction.
Thanks, Brian On Thu, Dec 3, 2009 at 1:33 PM, Jim Basney <jbas...@ncsa.uiuc.edu> wrote: > Hi Brian, > > "Host key verification failed" is an ssh client-side error. The top hit > from Google for this error message is > <http://www.securityfocus.com/infocus/1806> which looks like a good > reference on the topic. I suspect you need to populate and distribute > /etc/ssh_known_hosts files between your nodes. > > -Jim > > Brian Pratt wrote: > > Actually more of a logging question - I don't expect anyone to solve the > > problem by remote control, but I'm having a bit of trouble figuring out > > which node (server or client) the error is coming from. > > > > Here's the scenario: a node running globus/ws-gram/pbs_server/pbs_sched > and > > one running pbs_mom. Using the globus simple ca. Job-submitting user is > > "labkey" on the globus node, and there's a labkey user on the client node > > too. > > > > I can watch decrypted SSL traffic on the client node with ssldump and > > simpleca private key and can see the job script being handed to the > pbs_mom > > node. > > > > passwordless ssh/scp is configured between the two nodes. > > > > job-submitting user's .globus directory is shared via nfs with the mom > > node. UIDs agree on both nodes. globus user can write to it. > > > > Jobs submitted with qsub are fine. "qsub -o > > ~labkey/globus_test/qsubtest_output.txt -e > > ~labkey/globus_test/qsubtest_err.txt qsubtest" > > cat qsubtest > > #!/bin/bash > > date > > env > > logger "hello from qsubtest, I am $(whoami)" > > and indeed it executes on the pbs_mom client node. > > > > Jobs submitted with fork are fine. "globusrun-ws -submit -f > gramtest_fork" > > cat gramtest_fork > > <job> > > <executable>/mnt/userdata/gramtest_fork.sh</executable> > > <stdout>globus_test/gramtest_fork_stdout</stdout> > > <stderr>globus_test/gramtest_fork_stderr</stderr> > > </job> > > but those run local to the globus node, of course. > > > > But a job submitted as > > globusrun-ws -submit -f gramtest_pbs -Ft PBS > > > > cat gramtest_pbs > > <job> > > <executable>/usr/bin/env</executable> > > <stdout>gramtest_pbs_stdout</stdout> > > <stderr>gramtest_pbs_stderr</stderr> > > </job> > > > > Gives this: cat globusrun-ws -submit -f gramtest_pbs -Ft PBS > > Host key verification failed. > > /bin/touch: cannot touch > > `/home/labkey/.globus/c5acdc30-e04c-11de-9567-d32d83561bbd/exit.0': No > such > > file or directory > > /var/spool/torque/mom_priv/jobs/ > > 1.domu-12-31-38-00-b4-b5.compute-1.internal.SC<http://1.domu-12-31-38-00-b4-b5.compute-1.internal.sc/>: > 59: cannot open > > /home/labkey/.globus/c5acdc30-e04c-11de-9567-d32d83561bbd/exit.0: No such > > file > > [: 59: !=: unexpected operator > > > > I'm stumped - what piece of the authentication picture am I missing? And > > how to identify the actor that emitted that failure message? > > > > Thanks, > > > > Brian Pratt >