What does the container.log of the globus server say?
Steve Timm On Fri, 12 Feb 2010, Kunal Patel wrote:
Hi, I have set up a condor pool, a linux central manager which can execute and submit and there are a combination of Linux and Windows machines in the pool which can execute and submit jobs. I am now trying to use the Globus grid manager. I have been through the tutorial at https://bi.offis.de/wisent/tiki-index.php?page=Condor-GT4-Admin. I have installed globus on the central manager itself and am attempting to submit from there also. The certificates have been created for myself and the HIGH/LOW PORT macros have been set. I am having trouble though, it seems as though the globus server, I think GRAM is never actually being started, hence the job never leaves the idle state; this is part of the gridmanager log: GAHP[4439] <- 'GT4_GRAM_PING 4 https://10.1.207.26/wsrf/services/ManagedJobFactoryService'02/12 10:36:35 [4432] GAHP[4439] -> 'S'02/12 10:36:35 [4432] GAHP[4439] (stderr) -> AxisFault02/12 10:36:35 [4432] GAHP[4439] (stderr) -> faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException02/12 10:36:35 [4432] GAHP[4439] (stderr) -> faultSubcode: 02/12 10:36:35 [4432] GAHP[4439] (stderr) -> faultString: java.net.ConnectException: Connection refused02/12 10:36:35 [4432] GAHP[4439] (stderr) -> faultActor: 02/12 10:36:35 [4432] GAHP[4439] (stderr) -> faultNode: 02/12 10:36:35 [4432] GAHP[4439] (stderr) -> faultDetail: 02/12 10:36:35 [4432] GAHP[4439] (stderr) -> {http://xml.apache.org/axis/}stackTrace:java.net.ConnectException: Connection refused02/12 10:36:35 [4432] GAHP[4439] (stderr) -> at java.net.PlainSocketImpl.socketConnect(Native Method)02/12 10:36:35 [4432] GAHP[4439] (stderr) -> at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImp
l.java:3
My condor submit file looks like : ######universe = gridgrid_resource = gt4 https://10.1.207.26/wsrf/services/ManagedJobFactoryService Condorexecutable = helloworld.batrequirements = OpSys == "MSWin32_NT51" && Arch == "X86" output = hellowin.outerror = hellowin.errorlog = hellowin.log should_transfer_files = YESwhen_to_transfer_output = ON_EXIT Queue ###### I would appreciate any help from anyone with regards what is going on. Thanks, Kunal _________________________________________________________________ Got a cool Hotmail story? Tell us now http://clk.atdmt.com/UKM/go/195013117/direct/01/
-- ------------------------------------------------------------------ Steven C. Timm, Ph.D (630) 840-8525 [email protected] http://home.fnal.gov/~timm/ Fermilab Computing Division, Scientific Computing Facilities, Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.
