Dear list and particularly GT 4.x and Condor GRAM users,

To those of you who may be interested, I have confirmed that the Condor SEG
is sensitive to the order and formatting of the elements in
globus-condor.log. If I shift them around manually so that they conform to
what a <c> block looks like with an older version of Condor, Globus
properly picks up the fact that my jobs are complete. However, as it goes
to stage out the files, I get an RFT failure saying the "Database driver is
not initialized, Need to setup database". My question is: do I have
something wrong with RFT in this Globus installation? This seems unlikely
since I have several others configured the same way without any issue, and
there are no problems staging files *in*. Does it have to do with my
manually mucking around with the globus-condor.log file? A relevant bit of
the container.log is below. I'd appreciate your input because regarding
this and also a possible workaround for the SEG parsing problem (Condor
version downgrade, modifications to SEG parsing code, or otherwise).

thanks,
Adam



On Thu, Aug 30, 2012 at 4:09 PM, Adam Bazinet <[email protected]>wrote:

> Hi all,
>
> I just installed a new GT 4.2.1 on a Condor resource, and while jobs run
> and complete in Condor fine, the finished status is not being picked up by
> Globus appropriately.
>
> Is the GT 4.2.1 sensitive to the order/formatting of the elements in
> globus-condor.log?
>
> For example, here is a JobTerminatedEvent from globus-condor.log from an
> older Condor/Globus installation that works properly:
>
> <c>
>     <a n="MyType"><s>JobTerminatedEvent</s></a>
>     <a n="EventTypeNumber"><i>5</i></a>
>     <a n="MyType"><s>JobTerminatedEvent</s></a>
>     <a n="EventTime"><s>2012-08-29T00:32:04</s></a>
>     <a n="Cluster"><i>18720</i></a>
>     <a n="Proc"><i>36</i></a>
>     <a n="Subproc"><i>0</i></a>
>     <a n="TerminatedNormally"><b v="t"/></a>
>     <a n="ReturnValue"><i>0</i></a>
>     <a n="RunLocalUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a>
>     <a n="RunRemoteUsage"><s>Usr 0 07:01:33, Sys 0 00:00:13</s></a>
>     <a n="TotalLocalUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a>
>     <a n="TotalRemoteUsage"><s>Usr 0 07:01:33, Sys 0 00:00:13</s></a>
>     <a n="SentBytes"><r>3.244000000000000E+04</r></a>
>     <a n="ReceivedBytes"><r>4.281464000000000E+06</r></a>
>     <a n="TotalSentBytes"><r>3.244000000000000E+04</r></a>
>     <a n="TotalReceivedBytes"><r>5.565903200000000E+07</r></a>
> </c>
>
> and here is the one from the new resource that is NOT working properly:
>
> <c>
>     <a n="MyType"><s>JobTerminatedEvent</s></a>
>     <a n="TotalLocalUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a>
>     <a n="Proc"><i>0</i></a>
>     <a n="EventTime"><s>2012-08-28T16:07:17</s></a>
>     <a n="TotalRemoteUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a>
>     <a n="TotalReceivedBytes"><r>2.231476000000000E+06</r></a>
>     <a n="ReturnValue"><i>0</i></a>
>     <a n="RunRemoteUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a>
>     <a n="RunLocalUsage"><s>Usr 0 00:00:00, Sys 0 00:00:00</s></a>
>     <a n="SentBytes"><r>3.979500000000000E+04</r></a>
>     <a n="Cluster"><i>20</i></a>
>     <a n="TotalSentBytes"><r>3.979500000000000E+04</r></a>
>     <a n="Subproc"><i>0</i></a>
>     <a n="CurrentTime"><e>time()</e></a>
>     <a n="EventTypeNumber"><i>5</i></a>
>     <a n="ReceivedBytes"><r>2.231476000000000E+06</r></a>
>     <a n="TerminatedNormally"><b v="t"/></a>
> </c>
>
> Basically, the only variable that's different about this resource is that
> it's running a newer version of Condor. My hunch is that broke
> compatibility somewhere along the way. Can someone confirm this, or provide
> another mechanism to debug? I'm attaching the container.log from the
> resource in question, which has GRAM debugging enabled as some jobs came
> in. It didn't really show me anything, though.
>
> thanks,
> Adam
>
>

Reply via email to