Agreed. @Jonathan, can you file jiras for the 2 cases with related stack traces?
The domain handling might be the trickier issue as we will need to disable ATS publishing in case the domain could not be created. — Hitesh On Jan 21, 2015, at 4:03 PM, Jonathan Eagles <[email protected]> wrote: > I just checked this behavior in a secure cluster and if it fails to get a > timeline server delegation token or fails to post the domain, the job will > fail. We should consider making these operations "best effort" as well. > On Jan 21, 2015 5:33 PM, "Hitesh Shah" <[email protected]> wrote: > >> Actually at this time, the current impl just logs a WARN when there is a >> failure pushing data to ATS. ATS is not treated as a critical entity as it >> is not needed for job recovery. >> >> — Hitesh >> >> On Jan 21, 2015, at 3:01 PM, Rohini Palaniswamy <[email protected]> >> wrote: >> >>> Folks, >>> In the middle of big discussion on how to get delegation tokens from >>> ATS for Oozie jobs, another question came up. What is the behaviour of >>> running tez jobs if ATS goes down. Haven't tried it out, but my guess is >>> the job is going to fail. Or do we do something now to handle the failure >>> and still have the job complete successfully? >>> >>> Regards, >>> Rohini >> >>
