Hi,

after fiddling around with netstat (actually trying to produce a log to help in analysis, and stumbling upon tcp6 entries) I found a workaround: disabling ipv6 on the VM makes the remote jobs get done.

I could probably also add entries for ipv6 in my host file, I will try that later. I still have no idea why ipv6 was apparently only used for the notification part. Do you have any idea why?

Also, the error message with globusrun-ws remains.

Bests,
Timo



Am 07.07.2011 15:38, schrieb Eduardo Huedo:
Hi,

I don't think it is related. Could you send me the full output?

Thanks,

Dr. Eduardo Huedo Cuesta
Associate Professor (Profesor Titular), Universidad Complutense de Madrid
http://dsa-research.org/ehuedo



2011/7/7 Timo Henne <[email protected]
<mailto:[email protected]>>

    Dear Eduardo,

    with globusrun-ws -submit -F <remote_machine> -dbg -c /bin/ls

    I get this msg:
    ...
    Current job state: Done
    Destroying job...
    === REQUEST MESSAGE (length 427) (time 1310039859.775479000) ===
    <ns00:Envelope
    xmlns:ns00="http://schemas.__xmlsoap.org/soap/envelope/
    
<http://schemas.xmlsoap.org/soap/envelope/>"><__ns00:Header></ns00:Header><__ns00:Body><ns01:terminate
    xmlns:ns01="http://www.globus.__org/namespaces/2008/03/gram/__job/terminate
    
<http://www.globus.org/namespaces/2008/03/gram/job/terminate>"><ns01:__destroyAfterCleanup>true</__ns01:destroyAfterCleanup><__ns01:continueNotifying>false</__ns01:continueNotifying><ns01:__destroyDelegatedCredentials>__false</ns01:__destroyDelegatedCredentials></__ns01:terminate></ns00:Body></__ns00:Envelope>
    ------------------------------__----------------
    Failed.
    globusrun-ws: Unable to destroy job: Error: invalid or unknown job
    reference. Unable to destroy job. It may have expired or already
    been destroyed.

    and this error in container.log:
    2011-07-07T13:57:40.684+02:00 ERROR
    providers.__TerminateManagedJobProvider
    [ServiceThread-58,logError:__184] Job resource
    54869ce0-a890-11e0-ae4b-__edf1d7e307e9 not found.

    ...which doesn't really help me. Does it mean anything concerning my
    problem?

    Timo


    Am 07.07.2011 13:26, schrieb Eduardo Huedo:

        Dear Timo,

        Since only remote notifications fail, I am quite sure that the
        problem
        is about networking.
        But maybe you can get more information with a simple job
        submission with
        globusrun-ws using -dbg.

        Regarding IGE RT, you can use a guest account
        (http://www.ige-project.eu/__hub/rt/rtguest
        <http://www.ige-project.eu/hub/rt/rtguest>) or request your own
        (http://www.ige-project.eu/__hub/rt
        <http://www.ige-project.eu/hub/rt>).

        Regards,

        Dr. Eduardo Huedo Cuesta
        Associate Professor (Profesor Titular), Universidad Complutense
        de Madrid
        http://dsa-research.org/ehuedo



        2011/7/7 Timo Henne <[email protected]
        <mailto:[email protected]>
        <mailto:[email protected]__goettingen.de
        <mailto:[email protected]>>>

            Dear Eduardo,

            thanks for your answer. However, I now ensured that the
        variable is
            set, according to some page which I found I also set
        tcp.port.range
            in ~/.globus/cog.properties, I even disabled the
        firewall(s), yet to
            no avail: the same warning message appears. Do you have any more
            ideas on what to try? Can the VM be a problem? What could I
        try to
            test this?

            I signed up for egcf, but for the ticketing system I need an
        account
            which I don't have (yet) and don't know where to get it from.

            Thanks,
            Timo


            Am 07.07.2011 12:14, schrieb Eduardo Huedo:

                Dear Timo,

                Since it says "Connection refused", first of all, ensure
        that
                you don't
                have any firewall problem.
                For example, check that GLOBUS_TCP_PORT_RANGE is
        appropriately
                set in
                the client. As you probably know, the client starts a small
                container to
                receive notifications in a user-space dynamic port. This
                variable limits
                the range of these dynamic ports, so only that range
        should be
                open in
                the firewall.

                For your information, the IGE project
                (http://www.ige-project.eu) is now
                providing support (and much more) for Globus in Europe. For
                example, we
                have a request tracking system
        (http://rt.ige-project.eu) where
                you can
                open a ticket to request support, suggest improvement or
        report
                bugs.

                Regards,

                Dr. Eduardo Huedo Cuesta
                Associate Professor (Profesor Titular), Universidad
        Complutense
                de Madrid
        http://dsa-research.org/ehuedo



                2011/7/7 Timo Henne <[email protected]
        <mailto:[email protected]>
        <mailto:[email protected]__goettingen.de
        <mailto:[email protected]>>
        <mailto:[email protected]____goettingen.de
        <mailto:[email protected]__goettingen.de>
        <mailto:[email protected]__goettingen.de
        <mailto:[email protected]>>>>


                    Hi,

                    my previous two mails somehow didn't make it to the
        list, so
                here is
                    another attempt:

                    I am trying to use gridway(5.6.1) to schedule simple
        test
                jobs (/bin/ls)
                    across two machines, both with gt4.2.1 installed and
                running. One of the
                    machines is running Debian, the other is a VM
        running Ubuntu.
                    Communication and Authentification apparently works
        fine,
                the machines
                    see and trust each other, and the jobs gets scheduled.
                However, in *both
                    directions*, only those jobs running on the local
        machine
                (from where
                    they are started using gwsubmit) actually get "done"
        - the
                others remain
                    in "wrap pend" state. Apparently they are executed
        correctly
                on the
                    remote machine, since the result output is there, but
                somehow the
                    notification to the originating machine fails.
        Searching the
                list,
                    enabling debugging and digging in the logs I found this
                    warning/exception at the
                    end of the globus container.log on the remote machine:

        <...snip...>
                    2011-07-01T15:40:41.231+02:00 INFO
          impl.DefaultIndexService

          [ServiceThread-58,______performDefaultRegistrations:______261]
                    guid=b646c8e0-a3e7-11e0-b059-______b76342becd29


          
event=org.globus.mds.index.______performDefaultRegistrations.______end status=0
                    2011-07-01T15:41:21.751+02:00 INFO


          
PersistentManagedExecutableJob______Resource.ce605ef0-a3e7-__11e0-____b059-b76342becd29

                    [ServiceThread-57,start:761] Job
                ce605ef0-a3e7-11e0-b059-______b76342becd29
                    with client submission-id null accepted for local
        user 'the'
                    2011-07-01T15:41:22.032+02:00 INFO
          handler.SubmitStateHandler
                    [pool-1-thread-7,process:172] Job
                ce605ef0-a3e7-11e0-b059-______b76342becd29
                    submitted with local job ID
        'ce9b08e8-a3e7-11e0-bcc8-______b7ebd4913b23:17697'
                    2011-07-01T15:41:23.327+02:00 DEBUG
                    impl.______SimpleSubscriptionTopicListene______r

                    [pool-1-thread-1,setPort:287] Security properties
        not null: not
                    secure conv
                    2011-07-01T15:41:23.327+02:00 DEBUG
                    impl.______SimpleSubscriptionTopicListene______r

                    [pool-1-thread-1,setPort:314] set port with false
                    2011-07-01T15:41:23.366+02:00 DEBUG
                    impl.______SimpleSubscriptionTopicListene______r

                    [pool-1-thread-1,setPort:290] Setting security
        properties
                    2011-07-01T15:41:23.482+02:00 DEBUG
                    impl.______SimpleSubscriptionTopicListene______r

                    [pool-1-thread-3,setPort:287] Security properties
        not null: not
                    secure conv
                    2011-07-01T15:41:23.483+02:00 DEBUG
                    impl.______SimpleSubscriptionTopicListene______r

                    [pool-1-thread-3,setPort:314] set port with false
                    2011-07-01T15:41:23.483+02:00 DEBUG
                    impl.______SimpleSubscriptionTopicListene______r

                    [pool-1-thread-3,setPort:290] Setting security
        properties
                    2011-07-01T15:41:23.505+02:00 WARN
                      impl.______SimpleSubscriptionTopicListene______r

                    [pool-1-thread-3,topicChanged:______129]
        [JWSCORE-169] Failed
                to send
                    notification for subscription with key
        
'______B26B14DD21498C52B1E38CC2F042B0______AF0E65BAE6+ce647da0-a3e7-____11e0-__b059-b76342becd29':



                    java.net.ConnectException: Connection refused
                    2011-07-01T15:41:23.506+02:00 DEBUG
                    impl.______SimpleSubscriptionTopicListene______r

                    [pool-1-thread-3,topicChanged:______132]
                    javax.xml.rpc.JAXRPCException:
        java.net.ConnectException:
                Connection
                    refused
                            at

          org.apache.axis.client.Call.______invokeOneWay(Call.java:1871)
                            at


          
org.oasis.wsn.______NotificationConsumerSOAPBindin______gStub.notify(______NotificationConsumerSOAPBindin______gStub.java:701)
                            at


          
org.globus.wsrf.impl.______SimpleSubscriptionTopicListene______r.notify(______SimpleSubscriptionTopicListene______r.java:256)
                            at


          
org.globus.wsrf.impl.______SimpleSubscriptionTopicListene______r.topicChanged(______SimpleSubscriptionTopicListene______r.java:123)

                            at


          
org.globus.wsrf.impl.______SimpleTopic.topicChanged(______SimpleTopic.java:205)
                            at


          
org.globus.wsrf.impl.______SimpleTopic.notify(______SimpleTopic.java:112)
                            at


          
org.globus.exec.service.exec.________ManagedExecutableJobResource.______setState(______ManagedExecutableJobResource.______java:909)
                            at


          
org.globus.exec.service.exec.______processing.handler.______CleanUpStateHandler.process(______CleanUpStateHandler.java:56)
                            at


          
org.globus.exec.service.exec.______processing.handler.______InternalStateHandler.______processInternalState(______InternalStateHandler.java:49)
                            at


          
org.globus.exec.service.exec.______processing.StateMachine.______processInternalState(______StateMachine.java:121)
                            at


          
org.globus.exec.service.exec.______processing.______StateProcessingTask.run(______StateProcessingTask.java:82)
                            at


          
java.util.concurrent.______ThreadPoolExecutor$Worker.______runTask(ThreadPoolExecutor.______java:886)
                            at


          
java.util.concurrent.______ThreadPoolExecutor$Worker.run(______ThreadPoolExecutor.java:__908)
                            at java.lang.Thread.run(Thread.______java:662)
        <...snip...>

                    The VM has a static IP. Have you got any clue for me
        on what
                could be
                    the problem? Anything else I should provide for
        analysis?

                    Thanks,
                    Timo



            --
            --
            Timo Henne
            Research and Development Department (RDD)
            State and University Library
            Georg-August-Universitaet Goettingen
            37073 Goettingen
            Germany

            Phone: +49 551 39 3883
        http://www.sub.uni-goettingen.____de/
        <http://www.sub.uni-__goettingen.de/
        <http://www.sub.uni-goettingen.de/>>



    --
    --
    Timo Henne
    Research and Development Department (RDD)
    State and University Library
    Georg-August-Universitaet Goettingen
    37073 Goettingen
    Germany

    Phone: +49 551 39 3883
    http://www.sub.uni-goettingen.__de/ <http://www.sub.uni-goettingen.de/>



--
--
Timo Henne
Research and Development Department (RDD)
State and University Library
Georg-August-Universitaet Goettingen
37073 Goettingen
Germany

Phone: +49 551 39 3883
http://www.sub.uni-goettingen.de/

Reply via email to