Hi Yuriy, This clicks with the experience I have with GRAM5. GT5 tries to "merge" jobs running under the same account, so that they are managed by the same job manager. This improves scaling - if a user submits a large number of jobs, you'd still have only one job manager running.
We've run into issues when we used this with our model of shared accounts (yes, we are now moving from shared accounts to account pools). GRAM5 was trying to run all jobs of all users under the shared account under a single account - with the certificate submitted with the first job being used for all. That was breaking any attempts to retrieve the job status for the other jobs - and was solved by running a separate job manager for each combination of <local account, DN> But what you are observing now clicks into this experience. Clicks as to why it's happening - but not convincing me as that being the right thing that should happen. Having inconsistent behavior as to what happens when a short-lived proxy expires (depending on whether there is another job running with a longer lived proxy) is quite a bad thing. I understand the job manager cannot continue running when the proxy expires, but at least reconnecting to the job / restarting the job manager and getting reliable audit messages should work. That looks like a GRAM5 bug to me. Would you be able to investigate and collect more data on how the restart breaks? Cheers, Vlad Yuriy Halytskyy wrote: > Hi, > > When job proxy expires, its manager dies, but the job itself keeps > running. There is no way to check its status, and even when job is > submitted with save_state=true, the restart does not work. Also, audit > never puts the record of completed job into the database. > > On the other hand if I submit two jobs, first one with long proxy, and > second job with shorter proxy, even when second proxy expires I can > still query the job as long as first proxy is valid and job manager is > running. > > GRAM4 never had this problem, even when proxy expires job status is > still available and it is properly audited. Is it possible for gram5 > to have the same behaviour? At least being able to restart job manager > after proxy expiration and have it properly audited. > > > Cheers, > Yuriy -- Vladimir Mencl, Ph.D. E-Research Services and Systems Consultant BlueFern Computing Services University of Canterbury Private Bag 4800 Christchurch 8140 New Zealand http://www.bluefern.canterbury.ac.nz mailto:[email protected] Phone: +64 3 364 3012 Mobile: +64 21 997 352 Fax: +64 3 364 3002
