Hi, The problem you describe, and which is summarized in the bug you mention, is an architectural problem in WS-GRAM in 4.0. We fixed it in the 4.2 branch. We had to change the interface for this change that's why we can't port it back to the 4.0 branch. If you can upgrade to the 4.2 series I'd recommend this.
With 4.0.x there is currently no other way than: 1. Stop the container 2. Delete the problematic job from the persistence directory (by default ~/.globus of the user who runs the container). In your case: remove the file ~containeruser/.globus/<hostname>-<port>/ManagedExecutableJobResourceStateType/1748b3d0-8c4b-11de-8543-b8f655c16264.xml 3. Restart the container. -Martin Hazlewood, Victor Gene wrote: > Hey GTers, > > Running WSRF v 4.0.8-r2 on a Cray XT5. Have a user job that looks like > it has gone into an unresolvable state and the log file is filling up > with messages about not being able to resolve the FailureFileCleanUp > state. Anyone have any suggestions how to get rid of this? Have > looked at the documentation (nothing I found covers this), looked at > bugzilla (http://bugzilla.mcs.anl.gov/globus/show_bug.cgi?id=5247 is > close but says it will be fixed in a future release, but gives no > instructions how to resolve it currently). I'm running out of ideas. > > > > The recurring messages are > > > > 2009-08-29 12:40:02,267 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,getInternalState:1666] getting resource datum internalState > > 2009-08-29 12:40:02,267 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,remove:285] Waiting to be Done or Failed. Current state: > FailureFileCleanUp > > > > Any help on how to resolve this would be appreciated (besides the "it is > fixed in the next release" type of resolution). > > > > Below are the complete job entries for the job. > > > > -Victor > > > > > > Victor Hazlewood, CISSP > > Senior HPC Systems Analyst > > National Institute for Computational Science > > University of Tennessee > > http://www.nics.tennessee.edu/ <http://www.nics.utk.edu/> > > > > > > Complete log file entry: > > > > 2009-08-28 20:13:32,174 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,initialize:142] Entering initialize() > > 2009-08-28 20:13:32,175 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,initialize:147] at super.initialize() > > 2009-08-28 20:13:32,180 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,initialize:153] at initSecurity() > > 2009-08-28 20:13:32,180 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,initSecurity:316] Entering initSecurity() > > 2009-08-28 20:13:32,182 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,initSecurity:338] resource credential subject: > > 2009-08-28 20:13:32,183 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,initSecurity:346] setting resource securty grid map... > > 2009-08-28 20:13:32,183 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,initSecurity:356] Leaving initSecurity() > > 2009-08-28 20:13:32,186 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,initVariableMap:704] > GLOBUS_SCRATCH_DIR:${GLOBUS_USER_HOME}/.globus/scratch > > 2009-08-28 20:13:32,370 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1290] resolving variables in attribute > environment > > 2009-08-28 20:13:32,370 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1295] looking at string > ${GLOBUS_USER_HOME} > > 2009-08-28 20:13:32,370 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1296] found $ at index 0 > > 2009-08-28 20:13:32,371 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1302] found '{'---looks like a > reference > > 2009-08-28 20:13:32,371 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_HOME in > {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, > GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, > GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, > GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} > > 2009-08-28 20:13:32,371 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_HOME to value > /nics/c/home/turuncu > > 2009-08-28 20:13:32,371 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1392] Final string is > /nics/c/home/turuncu > > 2009-08-28 20:13:32,372 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1290] resolving variables in attribute > environment > > 2009-08-28 20:13:32,372 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1295] looking at string > ${GLOBUS_USER_NAME} > > 2009-08-28 20:13:32,372 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1296] found $ at index 0 > > 2009-08-28 20:13:32,372 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1302] found '{'---looks like a > reference > > 2009-08-28 20:13:32,373 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_NAME in > {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, > GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, > GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, > GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} > > 2009-08-28 20:13:32,373 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_NAME to value > turuncu > > 2009-08-28 20:13:32,373 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1392] Final string is turuncu > > 2009-08-28 20:13:32,373 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1290] resolving variables in attribute > environment > > 2009-08-28 20:13:32,374 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1295] looking at string > ${GLOBUS_SCRATCH_DIR} > > 2009-08-28 20:13:32,374 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1296] found $ at index 0 > > 2009-08-28 20:13:32,374 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1302] found '{'---looks like a > reference > > 2009-08-28 20:13:32,374 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1348] looking up GLOBUS_SCRATCH_DIR in > {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, > GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, > GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, > GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} > > 2009-08-28 20:13:32,375 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1353] mapped GLOBUS_SCRATCH_DIR to > value ${GLOBUS_USER_HOME}/.globus/scratch > > 2009-08-28 20:13:32,375 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1295] looking at string > ${GLOBUS_USER_HOME}/.globus/scratch > > 2009-08-28 20:13:32,375 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1296] found $ at index 0 > > 2009-08-28 20:13:32,375 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1302] found '{'---looks like a > reference > > 2009-08-28 20:13:32,376 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1348] looking up GLOBUS_USER_HOME in > {GLOBUS_SCRATCH_DIR=${GLOBUS_USER_HOME}/.globus/scratch, > GLOBUS_LOCATION=/usr/local/globus-wsrf-4.0.8-r2, > GLOBUS_JOB_ID=1748b3d0-8c4b-11de-8543-b8f655c16264, > GLOBUS_USER_HOME=/nics/c/home/turuncu, GLOBUS_USER_NAME=turuncu} > > 2009-08-28 20:13:32,376 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1353] mapped GLOBUS_USER_HOME to value > /nics/c/home/turuncu > > 2009-08-28 20:13:32,376 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,resolveVariableInString:1392] Final string is > /nics/c/home/turuncu/.globus/scratch > > 2009-08-28 20:13:32,377 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,initExtraPerlAttributes:588] Adding extra attributes to the > Perl job attribute map > > 2009-08-28 20:13:32,377 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,initExtraPerlAttributes:615] checking for condorness of PBS > > 2009-08-28 20:13:32,421 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,initialize:171] Perl Job Description: $description = { > > jobdir => [ > '/nics/c/home/turuncu/.globus/1748b3d0-8c4b-11de-8543-b8f655c16264' ], > > environment => [ [ 'GLOBUS_LOCATION', > '/usr/local/globus-wsrf-4.0.8-r2' ], [ 'X509_CERT_DIR', > '/etc/grid-security/certificates' ], [ 'X509_USER_PROXY', '' ], [ > 'X509_USER_CERT', '' ], [ 'X509_USER_KEY', '' ], [ 'HOME', > '/nics/c/home/turuncu' ], [ 'LOGNAME', 'turuncu' ], [ > 'SCRATCH_DIRECTORY', '/nics/c/home/turuncu/.globus/scratch' ], [ > 'JAVA_HOME', '/opt/java/jdk1.6.0_05/jre' ], [ 'GLOBUS_GRAM_JOB_HANDLE', > 'https://grid.nics.utk.edu:4321/wsrf/services/ManagedExecutableJobServic > e?1748b3d0-8c4b-11de-8543-b8f655c16264' ], ], > > 2009-08-28 20:13:32,421 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,initialize:178] Leaving initialize() > > 2009-08-28 20:13:32,429 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,getInternalState:1666] getting resource datum internalState > > 2009-08-28 20:13:32,429 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,remove:275] Remove called with external state Done and > internal state FailureFileCleanUp > > 2009-08-28 20:13:32,429 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,remove:285] Waiting to be Done or Failed. Current state: > FailureFileCleanUp > > 2009-08-28 20:13:34,432 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,getInternalState:1666] getting resource datum internalState > > 2009-08-28 20:13:34,432 DEBUG > ManagedExecutableJobResource.1748b3d0-8c4b-11de-8543-b8f655c16264 > [Thread-7,remove:285] Waiting to be Done or Failed. Current state: > FailureFileCleanUp > > > > (last two messages repeated 29536 times) > > > > > >
