Let me amend that - I do think that this is sniffing around the right tree,
which is why I said this is in some ways more of a logging question.  It
does look very much like an ssh issue, so what what I really need is to
figure out exactly what connection parameters were in use for the failue.
They seem to be different in some respect than those used in the qsub
transactions.  What I could really use is a hint at how to lay eyes on that.

Thanks,

Brian

On Thu, Dec 3, 2009 at 1:38 PM, Brian Pratt <brian.pr...@insilicos.com>wrote:

> Hi Jim,
>
> Thanks for the reply.  Unfortunately the answer doesn't seem to be that
> simple - I do have the ssh stuff worked out (believe me, I've googled the
> heck out of this thing!), the qsub test won't work without it.  I can scp
> between the two nodes in all combinations of user "globus" or "labkey",
> logged into either node, and in either direction.
>
> Thanks,
>
> Brian
>
>   On Thu, Dec 3, 2009 at 1:33 PM, Jim Basney <jbas...@ncsa.uiuc.edu>wrote:
>
>> Hi Brian,
>>
>> "Host key verification failed" is an ssh client-side error. The top hit
>> from Google for this error message is
>> <http://www.securityfocus.com/infocus/1806> which looks like a good
>> reference on the topic. I suspect you need to populate and distribute
>> /etc/ssh_known_hosts files between your nodes.
>>
>> -Jim
>>
>> Brian Pratt wrote:
>> > Actually more of a logging question - I don't expect anyone to solve the
>> > problem by remote control, but I'm having a bit of trouble figuring out
>> > which node (server or client) the error is coming from.
>> >
>> > Here's the scenario: a node running globus/ws-gram/pbs_server/pbs_sched
>> and
>> > one running pbs_mom. Using the globus simple ca.  Job-submitting user is
>> > "labkey" on the globus node, and there's a labkey user on the client
>> node
>> > too.
>> >
>> >  I can watch decrypted SSL traffic on the client node with ssldump and
>> > simpleca private key and can see the job script being handed to the
>> pbs_mom
>> > node.
>> >
>> > passwordless ssh/scp is configured between the two nodes.
>> >
>> > job-submitting user's .globus directory is shared via nfs with the mom
>> > node.  UIDs agree on both nodes.  globus user can write to it.
>> >
>> >  Jobs submitted with qsub are fine. "qsub -o
>> > ~labkey/globus_test/qsubtest_output.txt -e
>> > ~labkey/globus_test/qsubtest_err.txt qsubtest"
>> >  cat qsubtest
>> >    #!/bin/bash
>> >    date
>> >    env
>> >    logger "hello from qsubtest, I am $(whoami)"
>> > and indeed it executes on the pbs_mom client node.
>> >
>> > Jobs submitted with fork are fine.  "globusrun-ws -submit -f
>> gramtest_fork"
>> >  cat gramtest_fork
>> > <job>
>> >   <executable>/mnt/userdata/gramtest_fork.sh</executable>
>> >   <stdout>globus_test/gramtest_fork_stdout</stdout>
>> >   <stderr>globus_test/gramtest_fork_stderr</stderr>
>> > </job>
>> > but those run local to the globus node, of course.
>> >
>> > But a job submitted as
>> > globusrun-ws -submit -f gramtest_pbs -Ft PBS
>> >
>> > cat gramtest_pbs
>> > <job>
>> >   <executable>/usr/bin/env</executable>
>> >   <stdout>gramtest_pbs_stdout</stdout>
>> >   <stderr>gramtest_pbs_stderr</stderr>
>> > </job>
>> >
>> > Gives this: cat globusrun-ws -submit -f gramtest_pbs -Ft PBS
>> > Host key verification failed.
>> > /bin/touch: cannot touch
>> > `/home/labkey/.globus/c5acdc30-e04c-11de-9567-d32d83561bbd/exit.0': No
>> such
>> > file or directory
>> > /var/spool/torque/mom_priv/jobs/
>> > 1.domu-12-31-38-00-b4-b5.compute-1.internal.SC<http://1.domu-12-31-38-00-b4-b5.compute-1.internal.sc/>:
>> 59: cannot open
>> > /home/labkey/.globus/c5acdc30-e04c-11de-9567-d32d83561bbd/exit.0: No
>> such
>> > file
>> > [: 59: !=: unexpected operator
>> >
>> > I'm stumped - what piece of the authentication picture am I missing?
>>  And
>> > how to identify the actor that emitted that failure message?
>> >
>> > Thanks,
>> >
>> > Brian Pratt
>>
>
>

Reply via email to