Adam There is post deploy script that automatically creates the derby database for RFT. If you want to recreate it, you have first remove the directory rftDatabase. I don't remember the name of the script that creates the database for you of the top off my head. It is really strange that RFT does not give a database error in step 1 below but does in step 4. As far as i know there is no link between condor.log and RFT database setup. On Sep 12, 2012, at 3:30 PM, Adam Bazinet wrote:
> Hi Ravi, > > I cleaned out the installation and copied over an rftDatabase folder from a > pristine 4.2.1 installation (something I've done to reset Derby/RFT in the > past). Is there another way "manually create the derby database", as you say? > > Anyway, I tried another test and got the same results. I'll outline what I > did, though. > > 1) submitted two very short jobs to condor ... the files get staged in and > the jobs run just fine (see container.log.submission) > 2) stopped the container > 3) for one of the jobs, I went into var/globus-condor.log and rearranged > lines in the JobTerminatedEvent for one of the jobs so that upon container > restart, the SEG will detect the job is finished > 4) restarted the container ... the job I mucked around with is detected to be > 'Done', but I get the same RFT error upon stageout > (container.log.after_changing_condor_log) > > So, any idea what can be done here? If it is a straight RFT error, how can I > get more information about it? > > thanks, > Adam > > > > On Wed, Sep 12, 2012 at 10:11 AM, Ravi K Madduri <[email protected]> wrote: > Adam > I can't think of any reason why messing with condor log file will affect RFT. > Can you, may be, manually create the derby database and restart the > container? Or alternatively, you can create a mySQL database (instructions > are here: > http://www.globus.org/toolkit/docs/4.2/4.2.0/data/rft/admin/rft-admin-mysql.html) > > On Sep 11, 2012, at 5:23 PM, Adam Bazinet wrote: > >> Hi Ravi, >> >> Just the default Derby database, which has worked fine on all of our other >> installations. Again, I'm leaning towards something weird about what I was >> doing messing with the Condor log file, and maybe something gets out of >> sync, I don't know; the other possibility, of course, is that something with >> RFT isn't configured properly and the other problem just happened to expose >> it. If you know of anything I can do to debug, please let me know. >> >> thanks, >> Adam >> >> >> >> On Tue, Sep 11, 2012 at 6:19 PM, Ravi K Madduri <[email protected]> wrote: >> Adam >> What database do you use for RFT? Is it Postgresql or derby? >> >> On Sep 11, 2012, at 5:08 PM, Adam Bazinet wrote: >> >>> Sorry, forgot to attach the log: >>> >>> ---------------------------------------- >>> PROCESSING INTERNAL STATE: -- StageOut -- >>> ---------------------------------------- >>> 2012-09-11T17:49:48.478-04:00 DEBUG processing.StateMachine >>> [pool-1-thread-1,processInternalState:115] Processing resource >>> d69fb7c0-fc45-11e1-93da-f931d11c3d83 in internal state StageOut >>> 2012-09-11T17:49:48.478-04:00 DEBUG handler.InternalStateHandler >>> [pool-1-thread-1,processInternalState:44] >>> [resourceKey:d69fb7c0-fc45-11e1-93da-f931d11c3d83] Start processing >>> internal state StageOut >>> 2012-09-11T17:49:48.484-04:00 DEBUG >>> PersistentManagedExecutableJobResource.d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> [pool-1-thread-1,setState:846] receiving request for state change to >>> StageOut >>> 2012-09-11T17:49:48.484-04:00 DEBUG >>> PersistentManagedExecutableJobResource.d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> [pool-1-thread-1,setState:525] Setting new state RP value StageOut >>> 2012-09-11T17:49:48.484-04:00 DEBUG >>> PersistentManagedExecutableJobResource.d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> [pool-1-thread-1,setState:854] State of job >>> {http://www.globus.org/namespaces/2008/03/gram/job}ResourceID=d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> changed to: StageOut >>> 2012-09-11T17:49:48.484-04:00 DEBUG >>> PersistentManagedExecutableJobResource.d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> [pool-1-thread-1,setState:862] Holding: false >>> 2012-09-11T17:49:48.484-04:00 DEBUG >>> PersistentManagedExecutableJobResource.d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> [pool-1-thread-1,setState:896] exitCode is null >>> 2012-09-11T17:49:48.485-04:00 DEBUG >>> PersistentManagedExecutableJobResource.d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> [pool-1-thread-1,setState:905] Notifying of job state change to topic >>> listeners >>> 2012-09-11T17:49:48.549-04:00 DEBUG utils.LocalStagingHelper >>> [pool-1-thread-1,submitStagingRequest:72] >>> [resourceKey:d69fb7c0-fc45-11e1-93da-f931d11c3d83] Entering >>> sumbitStagingRequest() >>> 2012-09-11T17:49:48.552-04:00 DEBUG >>> PersistentManagedExecutableJobResource.d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> [pool-1-thread-1,getStagingCredential:434] getStagingCredential() called >>> 2012-09-11T17:49:48.552-04:00 DEBUG >>> PersistentManagedExecutableJobResource.d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> [pool-1-thread-1,getResourceDatum:209] getting resource datum gridMapFile >>> 2012-09-11T17:49:48.554-04:00 DEBUG >>> PersistentManagedExecutableJobResource.d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> [pool-1-thread-1,getStagingCredential:453] staging credential endpoint: >>> Address: https://128.8.91.70:8443/wsrf/services/DelegationService >>> Reference property[0]: >>> <ns1:DelegationKey >>> xmlns:ns1="http://www.globus.org/08/2004/delegationService">d61f6430-fc45-11e1-93da-f931d11c3d83</ns1:DelegationKey> >>> >>> 2012-09-11T17:49:48.554-04:00 DEBUG utils.DelegatedCredential >>> [pool-1-thread-1,getDelegatedCredential:98] Entering >>> getDelegatedCredential() >>> 2012-09-11T17:49:48.554-04:00 DEBUG utils.DelegatedCredential >>> [pool-1-thread-1,getDelegatedCredential:112] checking for existing >>> credential listener >>> 2012-09-11T17:49:48.555-04:00 DEBUG utils.DelegatedCredential >>> [pool-1-thread-1,getDelegationKey:605] Pulled out DelegationKey: >>> d61f6430-fc45-11e1-93da-f931d11c3d83 >>> 2012-09-11T17:49:48.555-04:00 DEBUG utils.DelegatedCredential >>> [pool-1-thread-1,getDelegatedCredential:124] reusing DelegatedCredential >>> object >>> 2012-09-11T17:49:48.555-04:00 DEBUG >>> PersistentManagedExecutableJobResource.d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> [pool-1-thread-1,getStagingCredential:476] Delegated Staging Credential >>> Object: org.globus.exec.service.utils.DelegatedCredential@26b98a06 >>> 2012-09-11T17:49:48.555-04:00 DEBUG utils.DelegatedCredential >>> [pool-1-thread-1,getCredential:333] waiting to receive initial credential... >>> 2012-09-11T17:49:48.556-04:00 DEBUG utils.DelegatedCredential >>> [pool-1-thread-1,getCredential:341] done waiting... >>> 2012-09-11T17:49:48.559-04:00 DEBUG >>> PersistentManagedExecutableJobResource.d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> [pool-1-thread-1,getStagingCredential:479] Using proxy with subject >>> /O=Grid/OU=GlobusTest/OU=simpleCA-leucine.umiacs.umd.edu/OU=umiacs.umd.edu/CN=GT4 >>> Adminto contact the staging service >>> 2012-09-11T17:49:48.580-04:00 DEBUG utils.LocalStagingHelper >>> [pool-1-thread-1,submitStagingRequest:91] creating RFT resource >>> 2012-09-11T17:49:48.711-04:00 DEBUG >>> factory.ReliableFileTransferFactoryService [pool-1-thread-1,oldLog:168] >>> Registration to MDS enabled >>> 2012-09-11T17:49:50.981-04:00 DEBUG utils.LocalInvocationHelper >>> [pool-1-thread-1,getServiceObject:403] >>> className=org.globus.transfer.reliable.service.factory.ReliableFileTransferFactoryService >>> handlerClass=org.globus.axis.providers.RPCProvider >>> 2012-09-11T17:49:50.981-04:00 DEBUG utils.LocalInvocationHelper >>> [pool-1-thread-1,getServiceObject:416] getting the service object >>> 2012-09-11T17:49:50.982-04:00 DEBUG utils.LocalInvocationHelper >>> [pool-1-thread-1,getServiceObject:422] caching service object for service >>> ReliableFileTransferFactoryService >>> 2012-09-11T17:49:50.983-04:00 DEBUG utils.LocalInvocationHelper >>> [pool-1-thread-1,getMethod:457] caching method >>> ReliableFileTransferFactoryService.createReliableFileTransfer >>> 2012-09-11T17:49:50.984-04:00 DEBUG >>> factory.ReliableFileTransferFactoryService [pool-1-thread-1,oldLog:168] >>> PerformanceLog createReliableFileTransfer() enter >>> 2012-09-11T17:49:50.984-04:00 DEBUG >>> factory.ReliableFileTransferFactoryService [pool-1-thread-1,oldLog:168] Got >>> a transfer request >>> 2012-09-11T17:49:51.049-04:00 DEBUG service.ReliableFileTransferHome >>> [pool-1-thread-1,oldLog:168] Loading the RFT home's list of resource keys >>> 2012-09-11T17:49:51.051-04:00 DEBUG database.RFTDatabaseSetup >>> [pool-1-thread-1,oldLog:168] getDBConnection() enter >>> 2012-09-11T17:49:51.051-04:00 WARN service.ReliableFileTransferHome >>> [pool-1-thread-1,oldLog:190] All RFT requests will fail and all GRAM jobs >>> that require file staging will fail.Database driver is not initialized, >>> Need to setup database >>> 2012-09-11T17:49:51.052-04:00 DEBUG service.ReliableFileTransferHome >>> [pool-1-thread-1,oldLog:164] All RFT requests will fail and all GRAM jobs >>> that require file staging will fail.Database driver is not initialized, >>> Need to setup database >>> org.globus.transfer.reliable.service.database.RftDBException: Database >>> driver is not initialized, Need to setup database >>> at >>> org.globus.transfer.reliable.service.database.RFTDatabaseSetup.getDBConnection(RFTDatabaseSetup.java:249) >>> at >>> org.globus.transfer.reliable.service.database.ReliableFileTransferDbAdapter.getActiveRequestIds(ReliableFileTransferDbAdapter.java:917) >>> at >>> org.globus.transfer.reliable.service.ReliableFileTransferResource.getStoredResourceKeyValues(ReliableFileTransferResource.java:544) >>> at >>> org.globus.transfer.reliable.service.ReliableFileTransferHome.initialize(ReliableFileTransferHome.java:69) >>> at >>> org.globus.wsrf.jndi.BasicBeanFactory.getObjectInstance(BasicBeanFactory.java:54) >>> at >>> org.globus.wsrf.jndi.BeanFactory.getInstance(BeanFactory.java:118) >>> at org.globus.wsrf.jndi.BeanFactory.access$100(BeanFactory.java:40) >>> at >>> org.globus.wsrf.jndi.BeanFactory$GetInstanceAction.run(BeanFactory.java:142) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:416) >>> at org.globus.gsi.jaas.GlobusSubject.runAs(GlobusSubject.java:60) >>> at org.globus.gsi.jaas.JaasSubject.doAs(JaasSubject.java:100) >>> at >>> org.globus.wsrf.jndi.BeanFactory.getObjectInstance(BeanFactory.java:90) >>> at >>> org.apache.naming.factory.ResourceFactory.getObjectInstance(ResourceFactory.java:135) >>> at >>> javax.naming.spi.NamingManager.getObjectInstance(NamingManager.java:321) >>> at org.apache.naming.NamingContext.lookup(NamingContext.java:827) >>> at org.apache.naming.NamingContext.lookup(NamingContext.java:155) >>> at >>> org.apache.naming.SynchronizedContext.lookup(SynchronizedContext.java:69) >>> at org.apache.naming.NamingContext.lookup(NamingContext.java:815) >>> at org.apache.naming.NamingContext.lookup(NamingContext.java:155) >>> at >>> org.apache.naming.SynchronizedContext.lookup(SynchronizedContext.java:69) >>> at org.apache.naming.NamingContext.lookup(NamingContext.java:815) >>> at org.apache.naming.NamingContext.lookup(NamingContext.java:155) >>> at >>> org.apache.naming.SynchronizedContext.lookup(SynchronizedContext.java:69) >>> at org.apache.naming.NamingContext.lookup(NamingContext.java:815) >>> at org.apache.naming.NamingContext.lookup(NamingContext.java:155) >>> at >>> org.apache.naming.SynchronizedContext.lookup(SynchronizedContext.java:69) >>> at org.apache.naming.NamingContext.lookup(NamingContext.java:815) >>> at org.apache.naming.NamingContext.lookup(NamingContext.java:168) >>> at >>> org.apache.naming.SynchronizedContext.lookup(SynchronizedContext.java:81) >>> at javax.naming.InitialContext.lookup(InitialContext.java:409) >>> at >>> org.apache.naming.SelectorContext.lookup(SelectorContext.java:145) >>> at javax.naming.InitialContext.lookup(InitialContext.java:409) >>> at >>> org.globus.transfer.reliable.service.factory.ReliableFileTransferFactoryService.createReliableFileTransfer(ReliableFileTransferFactoryService.java:247) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:616) >>> at >>> org.globus.exec.service.exec.utils.LocalInvocationHelper.callService(LocalInvocationHelper.java:140) >>> at >>> org.globus.exec.service.exec.utils.LocalStagingHelper.submitStagingRequest(LocalStagingHelper.java:95) >>> at >>> org.globus.exec.service.exec.processing.handler.StageOutStateHandler.process(StageOutStateHandler.java:79) >>> at >>> org.globus.exec.service.exec.processing.handler.InternalStateHandler.processInternalState(InternalStateHandler.java:49) >>> at >>> org.globus.exec.service.exec.processing.StateMachine.processInternalState(StateMachine.java:121) >>> at >>> org.globus.exec.service.exec.processing.StateProcessingTask.run(StateProcessingTask.java:82) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>> at java.lang.Thread.run(Thread.java:679) >>> 2012-09-11T17:49:51.062-04:00 DEBUG database.RFTDatabaseSetup >>> [pool-1-thread-1,oldLog:168] getDBConnection() enter >>> 2012-09-11T17:49:51.062-04:00 ERROR >>> factory.ReliableFileTransferFactoryService [pool-1-thread-1,oldLog:175] >>> Unable to create RFT resource >>> org.globus.transfer.reliable.service.database.RftDBException: Database >>> driver is not initialized, Need to setup database >>> at >>> org.globus.transfer.reliable.service.database.RFTDatabaseSetup.getDBConnection(RFTDatabaseSetup.java:249) >>> at >>> org.globus.transfer.reliable.service.database.ReliableFileTransferDbAdapter.storeTransferRequest(ReliableFileTransferDbAdapter.java:204) >>> at >>> org.globus.transfer.reliable.service.ReliableFileTransferResource.<init>(ReliableFileTransferResource.java:191) >>> at >>> org.globus.transfer.reliable.service.ReliableFileTransferHome.create(ReliableFileTransferHome.java:126) >>> at >>> org.globus.transfer.reliable.service.factory.ReliableFileTransferFactoryService.createReliableFileTransfer(ReliableFileTransferFactoryService.java:249) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:616) >>> at >>> org.globus.exec.service.exec.utils.LocalInvocationHelper.callService(LocalInvocationHelper.java:140) >>> at >>> org.globus.exec.service.exec.utils.LocalStagingHelper.submitStagingRequest(LocalStagingHelper.java:95) >>> at >>> org.globus.exec.service.exec.processing.handler.StageOutStateHandler.process(StageOutStateHandler.java:79) >>> at >>> org.globus.exec.service.exec.processing.handler.InternalStateHandler.processInternalState(InternalStateHandler.java:49) >>> at >>> org.globus.exec.service.exec.processing.StateMachine.processInternalState(StateMachine.java:121) >>> at >>> org.globus.exec.service.exec.processing.StateProcessingTask.run(StateProcessingTask.java:82) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >>> at java.lang.Thread.run(Thread.java:679) >>> 2012-09-11T17:49:51.070-04:00 DEBUG utils.FaultUtils >>> [pool-1-thread-1,makeFault:541] Fault Class: class >>> org.globus.exec.generated.StagingFaultType >>> 2012-09-11T17:49:51.070-04:00 DEBUG utils.FaultUtils >>> [pool-1-thread-1,makeFault:542] Resource Key: >>> {http://www.globus.org/namespaces/2008/03/gram/job}ResourceID=d69fb7c0-fc45-11e1-93da-f931d11c3d83 >>> 2012-09-11T17:49:51.070-04:00 DEBUG utils.FaultUtils >>> [pool-1-thread-1,makeFault:543] Description: Staging error for RSL element >>> fileStageOut. >>> 2012-09-11T17:49:51.070-04:00 DEBUG utils.FaultUtils >>> [pool-1-thread-1,makeFault:545] Cause: java.rmi.RemoteException >>> 2012-09-11T17:49:51.070-04:00 DEBUG utils.FaultUtils >>> [pool-1-thread-1,makeFault:549] State when failure occurred: StageOut >>> 2012-09-11T17:49:51.070-04:00 DEBUG utils.FaultUtils >>> [pool-1-thread-1,makeFault:551] Script Command: StageOut >>> >>> >>> On Tue, Sep 11, 2012 at 6:05 PM, Adam Bazinet <[email protected]> >>> wrote: >>> Dear list and particularly GT 4.x and Condor GRAM users, >>> >>> To those of you who may be interested, I have confirmed that the Condor SEG >>> is sensitive to the order and formatting of the elements in >>> globus-condor.log. If I shift them around manually so that they conform to >>> what a <c> block looks like with an older version of Condor, Globus >>> properly picks up the fact that my jobs are complete. However, as it goes >>> to stage out the files, I get an RFT failure saying the "Database driver is >>> not initialized, Need to setup database". My question is: do I have >>> something wrong with RFT in this Globus installation? This seems unlikely >>> since I have several others configured the same way without any issue, and >>> there are no problems staging files in. Does it have to do with my manually >>> mucking around with the globus-condor.log file? A relevant bit of the >>> container.log is below. I'd appreciate your input because regarding this >>> and also a possible workaround for the SEG parsing problem (Condor version >>> downgrade, modifications to SEG parsing code, or otherwise). >>> >>> thanks, >>> Adam >>> >>> >>> >>> On Thu, Aug 30, 2012 at 4:09 PM, Adam Bazinet <[email protected]> >>> wrote: >>> Hi all, >>> >>> I just installed a new GT 4.2.1 on a Condor resource, and while jobs run >>> and complete in Condor fine, the finished status is not being picked up by >>> Globus appropriately. >>> >>> Is the GT 4.2.1 sensitive to the order/formatting of the elements in >>> globus-condor.log? >>> >>> For example, here is a JobTerminatedEvent from globus-condor.log from an >>> older Condor/Globus installation that works properly: >>> >>> <c> >>> <a n="MyType"><s>JobTerminatedEvent</s></a> >>> <a n="EventTypeNumber"><i>5</i></a> >>> <a n="MyType"><s>JobTerminatedEvent</s></a> >>> <a n="EventTime"><s>2012-08-29T00:32:04</s></a> >>> <a n="Cluster"><i>18720</i></a> >>> <a n="Proc"><i>36</i></a> >>> <a n="Subproc"><i>0</i></a> >>> <a n="TerminatedNormally"><b v="t"/></a> >>> <a n="ReturnValue"><i>0</i></a> >>> <a n="RunLocalUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a> >>> <a n="RunRemoteUsage"><s>Usr 0 07:01:33, Sys 0 00:00:13</s></a> >>> <a n="TotalLocalUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a> >>> <a n="TotalRemoteUsage"><s>Usr 0 07:01:33, Sys 0 00:00:13</s></a> >>> <a n="SentBytes"><r>3.244000000000000E+04</r></a> >>> <a n="ReceivedBytes"><r>4.281464000000000E+06</r></a> >>> <a n="TotalSentBytes"><r>3.244000000000000E+04</r></a> >>> <a n="TotalReceivedBytes"><r>5.565903200000000E+07</r></a> >>> </c> >>> >>> and here is the one from the new resource that is NOT working properly: >>> >>> <c> >>> <a n="MyType"><s>JobTerminatedEvent</s></a> >>> <a n="TotalLocalUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a> >>> <a n="Proc"><i>0</i></a> >>> <a n="EventTime"><s>2012-08-28T16:07:17</s></a> >>> <a n="TotalRemoteUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a> >>> <a n="TotalReceivedBytes"><r>2.231476000000000E+06</r></a> >>> <a n="ReturnValue"><i>0</i></a> >>> <a n="RunRemoteUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a> >>> <a n="RunLocalUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a> >>> <a n="SentBytes"><r>3.979500000000000E+04</r></a> >>> <a n="Cluster"><i>20</i></a> >>> <a n="TotalSentBytes"><r>3.979500000000000E+04</r></a> >>> <a n="Subproc"><i>0</i></a> >>> <a n="CurrentTime"><e>time()</e></a> >>> <a n="EventTypeNumber"><i>5</i></a> >>> <a n="ReceivedBytes"><r>2.231476000000000E+06</r></a> >>> <a n="TerminatedNormally"><b v="t"/></a> >>> </c> >>> >>> Basically, the only variable that's different about this resource is that >>> it's running a newer version of Condor. My hunch is that broke >>> compatibility somewhere along the way. Can someone confirm this, or provide >>> another mechanism to debug? I'm attaching the container.log from the >>> resource in question, which has GRAM debugging enabled as some jobs came >>> in. It didn't really show me anything, though. >>> >>> thanks, >>> Adam >>> >>> >>> >> >> --Ravi >> >> > > --Ravi > > > <container.log.submission><container.log.after_changing_condor_log> --Ravi
smime.p7s
Description: S/MIME cryptographic signature
