Martin Feller wrote: > Hi, > > The problem you describe, and which is summarized in the bug you mention, > is an architectural problem in WS-GRAM in 4.0. > We fixed it in the 4.2 branch. We had to change the interface for this change > that's why we can't port it back to the 4.0 branch. > If you can upgrade to the 4.2 series I'd recommend this. > > With 4.0.x there is currently no other way than: > 1. Stop the container > 2. Delete the problematic job from the persistence directory (by default > ~/.globus of the user who runs the container). > In your case: remove the file > > ~containeruser/.globus/<hostname>-<port>/ManagedExecutableJobResourceStateType/1748b3d0-8c4b-11de-8543-b8f655c16264.xml
I'm sorry, the path i gave wasn't correct: The persistence directory is by default in ~containeruser/.globus/persisted, so it should be ~containeruser/.globus/persisted/<hostname>-<port>/ManagedExecutableJobResourceStateType/1748b3d0-8c4b-11de-8543-b8f655c16264.xml -Martin > 3. Restart the container. > > -Martin > > Hazlewood, Victor Gene wrote: >> Hey GTers, >> >> Running WSRF v 4.0.8-r2 on a Cray XT5. Have a user job that looks like >> it has gone into an unresolvable state and the log file is filling up >> with messages about not being able to resolve the FailureFileCleanUp >> state. Anyone have any suggestions how to get rid of this? Have >> looked at the documentation (nothing I found covers this), looked at >> bugzilla (http://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=5247 is >> close but says it will be fixed in a future release, but gives no >> instructions how to resolve it currently). I'm running out of ideas. >> >> >> >> The recurring messages are >> >> >> >> 2009-08-29 12:40:02,267 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,getInternalState:1666] getting resource datum internalState >> >> 2009-08-29 12:40:02,267 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,remove:285] Waiting to be Done or Failed. Current state: >> FailureFileCleanUp >> >> >> >> Any help on how to resolve this would be appreciated (besides the "it is >> fixed in the next release" type of resolution). >> >> >> >> Below are the complete job entries for the job. >> >> >> >> -Victor >> >> >> >> >> >> Victor Hazlewood, CISSP >> >> Senior HPC Systems Analyst >> >> National Institute for Computational Science >> >> University of Tennessee >> >> http://www.nics.tennessee.edu/ <http://www.nics.utk.edu/> >> >> >> >> >> >> Complete log file entry: >> >> >> >> 2009-08-28 20:13:32,174 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,initialize:142] Entering initialize() >> >> 2009-08-28 20:13:32,175 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,initialize:147] at super.initialize() >> >> 2009-08-28 20:13:32,180 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,initialize:153] at initSecurity() >> >> 2009-08-28 20:13:32,180 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,initSecurity:316] Entering initSecurity() >> >> 2009-08-28 20:13:32,182 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,initSecurity:338] resource credential subject: >> >> 2009-08-28 20:13:32,183 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,initSecurity:346] setting resource securty grid map... >> >> 2009-08-28 20:13:32,183 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,initSecurity:356] Leaving initSecurity() >> >> 2009-08-28 20:13:32,186 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,initVariableMap:704] >> GLOBUS_SCRATCH_DIR:${GLOBUS_USER_HOME}/.globus/scratch >> >> 2009-08-28 20:13:32,370 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1290] resolving variables in attribute >> environment >> >> 2009-08-28 20:13:32,370 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1295] looking at string >> ${GLOBUS_USER_HOME} >> >> 2009-08-28 20:13:32,370 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1296] found $ at index 0 >> >> 2009-08-28 20:13:32,371 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1302] found '{'---looks like a >> reference >> >> 2009-08-28 20:13:32,371 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_HOME in >> {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, >> GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, >> GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, >> GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} >> >> 2009-08-28 20:13:32,371 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_HOME to value >> /nics/c/home/turuncu >> >> 2009-08-28 20:13:32,371 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1392] Final string is >> /nics/c/home/turuncu >> >> 2009-08-28 20:13:32,372 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1290] resolving variables in attribute >> environment >> >> 2009-08-28 20:13:32,372 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1295] looking at string >> ${GLOBUS_USER_NAME} >> >> 2009-08-28 20:13:32,372 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1296] found $ at index 0 >> >> 2009-08-28 20:13:32,372 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1302] found '{'---looks like a >> reference >> >> 2009-08-28 20:13:32,373 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_NAME in >> {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, >> GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, >> GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, >> GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} >> >> 2009-08-28 20:13:32,373 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_NAME to value >> turuncu >> >> 2009-08-28 20:13:32,373 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1392] Final string is turuncu >> >> 2009-08-28 20:13:32,373 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1290] resolving variables in attribute >> environment >> >> 2009-08-28 20:13:32,374 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1295] looking at string >> ${GLOBUS_SCRATCH_DIR} >> >> 2009-08-28 20:13:32,374 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1296] found $ at index 0 >> >> 2009-08-28 20:13:32,374 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1302] found '{'---looks like a >> reference >> >> 2009-08-28 20:13:32,374 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1348] looking up GLOBUS_SCRATCH_DIR in >> {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, >> GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, >> GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, >> GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} >> >> 2009-08-28 20:13:32,375 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1353] mapped GLOBUS_SCRATCH_DIR to >> value ${GLOBUS_USER_HOME}/.globus/scratch >> >> 2009-08-28 20:13:32,375 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1295] looking at string >> ${GLOBUS_USER_HOME}/.globus/scratch >> >> 2009-08-28 20:13:32,375 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1296] found $ at index 0 >> >> 2009-08-28 20:13:32,375 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1302] found '{'---looks like a >> reference >> >> 2009-08-28 20:13:32,376 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_HOME in >> {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, >> GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, >> GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, >> GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} >> >> 2009-08-28 20:13:32,376 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_HOME to value >> /nics/c/home/turuncu >> >> 2009-08-28 20:13:32,376 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,resolveVariableInString:1392] Final string is >> /nics/c/home/turuncu/.globus/scratch >> >> 2009-08-28 20:13:32,377 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,initExtraPerlAttributes:588] Adding extra attributes to the >> Perl job attribute map >> >> 2009-08-28 20:13:32,377 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,initExtraPerlAttributes:615] checking for condorness of PBS >> >> 2009-08-28 20:13:32,421 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,initialize:171] Perl Job Description: $description = { >> >> jobdir => [ >> '/nics/c/home/turuncu/.globus/1748b3d0-8c4b-11de-8543-b8f655c16264' ], >> >> environment => [ [ 'GLOBUS_LOCATION', >> '/usr/local/globus-wsrf-4.0.8-r2' ], [ 'X509_CERT_DIR', >> '/etc/grid-security/certificates' ], [ 'X509_USER_PROXY', '' ], [ >> 'X509_USER_CERT', '' ], [ 'X509_USER_KEY', '' ], [ 'HOME', >> '/nics/c/home/turuncu' ], [ 'LOGNAME', 'turuncu' ], [ >> 'SCRATCH_DIRECTORY', '/nics/c/home/turuncu/.globus/scratch' ], [ >> 'JAVA_HOME', '/opt/java/jdk1.6.0_05/jre' ], [ 'GLOBUS_GRAM_JOB_HANDLE', >> 'https://grid.nics.utk.edu:4321/wsrf/services/ManagedExecutableJobServic >> e?1748b3d0-8c4b-11de-8543-b8f655c16264' ], ], >> >> 2009-08-28 20:13:32,421 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,initialize:178] Leaving initialize() >> >> 2009-08-28 20:13:32,429 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,getInternalState:1666] getting resource datum internalState >> >> 2009-08-28 20:13:32,429 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,remove:275] Remove called with external state Done and >> internal state FailureFileCleanUp >> >> 2009-08-28 20:13:32,429 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,remove:285] Waiting to be Done or Failed. Current state: >> FailureFileCleanUp >> >> 2009-08-28 20:13:34,432 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,getInternalState:1666] getting resource datum internalState >> >> 2009-08-28 20:13:34,432 DEBUG >> ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 >> [Thread-7,remove:285] Waiting to be Done or Failed. Current state: >> FailureFileCleanUp >> >> >> >> (last two messages repeated 29536 times) >> >> >> >> >> >> >
