> By adding this dependency on HDFS to be running, you can no longer start your cluster in any order; HDFS has to come before Oozie. We should have it defer the loading of the jars until it either somehow detects that HDFS is up or at the first job submission.
I am not clear as to what the problem here is. Even now with 3.3.x and 4.0 if HDFS is not up the Oozie's jobs cannot be submitted to the cluster, nor can users submit jobs to oozie as parsing workflow.xml will fail. What is being made worse with Oozie copying the sharelib jars during startup or before submitting the first job? Even if the admin has to run oozie-setup.sh before starting Oozie, HDFS needs to be up. Regards, Rohini On Fri, Sep 27, 2013 at 6:36 PM, Alejandro Abdelnur <t...@cloudera.com>wrote: > IMO, making a required order of services startup is a no go. > > > On Fri, Sep 27, 2013 at 5:10 PM, Robert Kanter <rkan...@cloudera.com> > wrote: > > > By adding this dependency on HDFS to be running, you can no longer start > > your cluster in any order; HDFS has to come before Oozie. We should have > > it defer the loading of the jars until it either somehow detects that > HDFS > > is up or at the first job submission. > > > > An admin can put oozie in safe mode after it starts and then issues > series > > > of commands to do the necessary maintenance (install/upgrade/purge) of > > > sharelibs. > > > > Also, I agree with Alejandro's point earlier: if the purpose of all this > is > > to make this all automatic so that the admin doesn't have to do anything, > > then this won't work; it sounds like more steps for the admin to deal > with. > > > > > > What if we just enhanced the oozie-setup.sh script to add the logic for > > temp and staging sharelib stuff and didn't have Oozie try to do anything > > itself. e.g. the admin wants to transition to a new set of sharelibs, > > they'd run oozie-setup.sh upgrade blah blah. Or perhaps make the admin > > command handle this logic and do it only when the command is run; then we > > don't have to worry about starting up and it would simplify the HA work > > because the different Oozies won't be changing the sharelib at startup. > > Thoughts? > > > > - Robert > > > > > > On Fri, Sep 27, 2013 at 11:58 AM, Rohini Palaniswamy < > > rohini.adi...@gmail.com> wrote: > > > > > Basically I am trying to answer the question is what was possible > before > > > when Oozie was up and hdfs is down? > > > - Before oozie could still get job submissions if hdfs was down. But > we > > > are going to try validate the workflow.xml from hdfs and submission is > > > anyway going to fail. So don't think that is a big issue > > > - It can't submit new jobs and input dependency checks will anyway > > fail. > > > So that is also not a issue > > > - Users can continue to query the status of oozie which will now be > not > > > possible as Oozie will be down. > > > > > > Is there something else that I am missing? > > > > > > One thing that I can think of to address this is to defer loading to > job > > > submission time if hdfs is down during startup time. > > > > > > -Rohini > > > > > > On Fri, Sep 27, 2013 at 11:51 AM, Rohini Palaniswamy < > > > rohini.adi...@gmail.com> wrote: > > > > > > > > Unless I'm missing something, is not just a missing hadoop conf > > issue, > > > > now > > > > Oozie won't start if HDFS is not running. > > > > > > > > Why would this be a issue? If sharelib is in hdfs and that is not > > > > accessible all the jobs would anyway fail and things are not going to > > > work. > > > > > > > > Regards, > > > > Rohini > > > > > > > > > > > > > > > > On Fri, Sep 27, 2013 at 8:08 AM, Alejandro Abdelnur < > t...@cloudera.com > > > >wrote: > > > > > > > >> afaik, the whole point of these changes was to make things > adminless, > > > >> thus the extra complexity. if we need admin intervention, i'd go to > > the > > > >> previous model. > > > >> > > > >> thx > > > >> > > > >> Alejandro > > > >> (phone typing) > > > >> > > > >> On Sep 26, 2013, at 22:46, Virag Kothari <vi...@yahoo-inc.com> > wrote: > > > >> > > > >> > That¹s a good point. An alternative is to do share lib install > after > > > >> Oozie > > > >> > starts. > > > >> > An admin can put oozie in safe mode after it starts and then > issues > > > >> series > > > >> > of commands to do the necessary maintenance > (install/upgrade/purge) > > of > > > >> > sharelibs. > > > >> > (OOZIE-1519 is already tracking admin upgrade of sharelibs) > > > >> > Also all this commands can be optionally made to accept a > > > configuration > > > >> > parameter so the hadoop conf's are not required in oozie's class > > path. > > > >> > > > > >> > Thanks, > > > >> > Virag > > > >> > > > > >> > On 9/26/13 1:54 PM, "Alejandro Abdelnur" <t...@cloudera.com> > wrote: > > > >> > > > > >> >> Unless I'm missing something, is not just a missing hadoop conf > > > issue, > > > >> now > > > >> >> Oozie won't start if HDFS is not running. > > > >> >> > > > >> >> This does not seem right. > > > >> >> > > > >> >> If we can sort this out i would prefer the previous manual > update > > fo > > > >> the > > > >> >> sharelib via oozie-setup.sh > > > >> >> > > > >> >> thanks. > > > >> >> > > > >> >> > > > >> >> > > > >> >> On Thu, Sep 26, 2013 at 1:22 PM, Virag Kothari < > > vi...@yahoo-inc.com> > > > >> >> wrote: > > > >> >> > > > >> >>> There is no need for this setting if you have the hadoop configs > > > under > > > >> >>> conf/hadoop-conf. Having this setting would be more useful if > you > > > have > > > >> >>> oozie configured to talk with multiple hadoops. > > > >> >>> This configs are now required at startup as the share lib > service > > > >> needs > > > >> >>> to > > > >> >>> connect to the filesystem on which the share lib jars need to be > > > >> copied > > > >> >>> (probably only require the core-site.xml, need to check) > > > >> >>> Before also it was recommended to have the hadoop configs on > > > >> >>> oozie-server, > > > >> >>> so users don't need to define some of this configurations in > > their > > > >> >>> workflows. As now its mandatory, we should make it clear in our > > > >> >>> documentation. > > > >> >>> > > > >> >>> Thanks, > > > >> >>> Virag > > > >> >>> > > > >> >>> From: bowen zhang > > > >> >>> <bowenzhang...@yahoo.com<mailto:bowenzhang...@yahoo.com > > > >> >>> Reply-To: bowen zhang <bowenzhang...@yahoo.com<mailto: > > > >> >>> bowenzhang...@yahoo.com>> > > > >> >>> Date: Thursday, September 26, 2013 12:20 PM > > > >> >>> To: "dev@oozie.apache.org<mailto:dev@oozie.apache.org>" < > > > >> >>> dev@oozie.apache.org<mailto:dev@oozie.apache.org>>, Virag > > Kothari < > > > >> >>> vi...@yahoo-inc.com<mailto:vi...@yahoo-inc.com>> > > > >> >>> Subject: Re: issue after OOZIE-1461 > > > >> >>> > > > >> >>> I second on Robert's concern. Right now, even the easiest way > for > > me > > > >> to > > > >> >>> get around this is to point " > > > >> >>> oozie.service.HadoopAccessorService.hadoop.configurations" to > > > >> >>> "*=Absolute > > > >> >>> path of my hadoop conf". > > > >> >>> Bowen > > > >> >>> > > > >> >>> ________________________________ > > > >> >>> From: Robert Kanter <rkan...@cloudera.com<mailto: > > > rkan...@cloudera.com > > > >> >> > > > >> >>> To: Virag Kothari <vi...@yahoo-inc.com<mailto: > vi...@yahoo-inc.com > > >> > > > >> >>> Cc: "dev@oozie.apache.org<mailto:dev@oozie.apache.org>" < > > > >> >>> dev@oozie.apache.org<mailto:dev@oozie.apache.org>>; bowen > zhang < > > > >> >>> bowenzhang...@yahoo.com<mailto:bowenzhang...@yahoo.com>> > > > >> >>> Sent: Thursday, September 26, 2013 11:11 AM > > > >> >>> Subject: Re: issue after OOZIE-1461 > > > >> >>> > > > >> >>> Is there any way to make that step not required, or to at least > > > >> >>> streamline > > > >> >>> it somehow? I imagine we'll see many questions from users > > wondering > > > >> why > > > >> >>> their Oozie server doesn't start because of this. Also, Oozie > > used > > > to > > > >> >>> work > > > >> >>> out-of-the-box after running a few scripts; now it requires > > manually > > > >> >>> setting the > > > oozie.service.HadoopAccessorService.hadoop.configurations > > > >> >>> property. > > > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> thanks > > > >> >>> - Robert > > > >> >>> > > > >> >>> > > > >> >>> > > > >> >>> On Mon, Sep 23, 2013 at 2:57 PM, Virag Kothari < > > vi...@yahoo-inc.com > > > >> >>> <mailto:vi...@yahoo-inc.com>> wrote: > > > >> >>> > > > >> >>>> Bowen, > > > >> >>>> > > > >> >>>> Including hadoop configs on oozie-server is documented at > > > >> >>>> > https://oozie.apache.org/docs/3.3.2/AG_HadoopConfiguration.html. > > > >> >>>> Documentation for sharelib installation can be updated once > > > >> OOZIE-1518 > > > >> >>> and > > > >> >>>> OOZIE-1519 goes in. We can also update quick start guide during > > > that > > > >> >>> time. > > > >> >>>> > > > >> >>>> Regards, > > > >> >>>> Virag > > > >> >>>> > > > >> >>>> > > > >> >>>> On 9/23/13 2:31 PM, "bowen zhang" <bowenzhang...@yahoo.com > > <mailto: > > > >> >>> bowenzhang...@yahoo.com>> wrote: > > > >> >>>> > > > >> >>>>> Virag, > > > >> >>>>> Can you add documentation for this change since the current > > build > > > >> and > > > >> >>>>> setup of oozie doesn't cover this? > > > >> >>>>> Bowen > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> ________________________________ > > > >> >>>>> From: Virag Kothari > > > >> >>> <vi...@yahoo-inc.com<mailto:vi...@yahoo-inc.com>> > > > >> >>>>> To: bowen zhang > > > >> >>> <bowenzhang...@yahoo.com<mailto:bowenzhang...@yahoo.com>>; > > > >> >>> "dev@oozie.apache.org<mailto:dev@oozie.apache.org>" > > > >> >>>>> <dev@oozie.apache.org<mailto:dev@oozie.apache.org>>; Robert > > > Kanter > > > >> < > > > >> >>> rkan...@cloudera.com<mailto:rkan...@cloudera.com>> > > > >> >>>>> Sent: Monday, September 16, 2013 2:58 PM > > > >> >>>>> Subject: Re: issue after OOZIE-1461 > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> Hi Robert/Bowen, > > > >> >>>>> > > > >> >>>>> The hadoop configs need to be there in class path (hadoop-conf > > dir > > > >> or > > > >> >>>>> oozie-server/lib). So the HadoopAccessorService can create the > > > >> >>>>> appropriate filesystem object. This will fix your current > issue. > > > >> >>>>> But there is one more problem you might face while running a > job > > > >> >>> where > > > >> >>>>> permissions are not recursively applied. This fix is in > > OOZIE-1528 > > > >> >>> and > > > >> >>>>> will be checked in shortly. > > > >> >>>>> > > > >> >>>>> Thanks, > > > >> >>>>> Virag > > > >> >>>>> > > > >> >>>>> From: bowen zhang > > > >> >>>>> <bowenzhang...@yahoo.com<mailto:bowenzhang...@yahoo.com > > ><mailto: > > > >> >>> bowenzhang...@yahoo.com<mailto:bowenzhang...@yahoo.com>>> > > > >> >>>>> Reply-To: bowen zhang > > > >> >>>>> <bowenzhang...@yahoo.com<mailto:bowenzhang...@yahoo.com > > ><mailto: > > > >> >>> bowenzhang...@yahoo.com<mailto:bowenzhang...@yahoo.com>>> > > > >> >>>>> Date: Monday, September 16, 2013 2:49 PM > > > >> >>>>> To: "dev@oozie.apache.org<mailto:dev@oozie.apache.org > ><mailto: > > > >> >>> dev@oozie.apache.org<mailto:dev@oozie.apache.org>>" > > > >> >>>>> <dev@oozie.apache.org<mailto:dev@oozie.apache.org><mailto: > > > >> >>> dev@oozie.apache.org<mailto:dev@oozie.apache.org>>>, Virag > > Kothari > > > >> >>>>> <vi...@yahoo-inc.com<mailto:vi...@yahoo-inc.com><mailto: > > > >> >>> vi...@yahoo-inc.com<mailto:vi...@yahoo-inc.com>>> > > > >> >>>>> Subject: Re: Fwd: issue after OOZIE-1461 > > > >> >>>>> > > > >> >>>>> what I found is the variable uri from tmpShareLibPath has > > > authority > > > >> >>> of > > > >> >>>>> "null". > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> ________________________________ > > > >> >>>>> From: Robert Kanter <rkan...@cloudera.com<mailto: > > > >> rkan...@cloudera.com > > > >> >>>> <mailto:rkan...@cloudera.com<mailto:rkan...@cloudera.com>>> > > > >> >>>>> To: Virag Kothari <vi...@yahoo-inc.com<mailto: > > vi...@yahoo-inc.com > > > >> >>>> <mailto:vi...@yahoo-inc.com<mailto:vi...@yahoo-inc.com>>>; > > > >> >>>>> "dev@oozie.apache.org<mailto:dev@oozie.apache.org><mailto: > > > >> >>> dev@oozie.apache.org<mailto:dev@oozie.apache.org>>" > > > >> >>>>> <dev@oozie.apache.org<mailto:dev@oozie.apache.org><mailto: > > > >> >>> dev@oozie.apache.org<mailto:dev@oozie.apache.org>>> > > > >> >>>>> Sent: Monday, September 16, 2013 2:39 PM > > > >> >>>>> Subject: Fwd: issue after OOZIE-1461 > > > >> >>>>> > > > >> >>>>> Hi Virag, > > > >> >>>>> > > > >> >>>>> After OOZIE-1461, Bowen (and I too) have run into this > exception > > > >> when > > > >> >>>>> starting Oozie, so it fails and won't start. I checked, and > for > > > me > > > >> >>> at > > > >> >>>>> least, the share/lib/ dir looks like it has the correct > > > permissions. > > > >> >>>>> Any thoughts? > > > >> >>>>> > > > >> >>>>> thanks > > > >> >>>>> - Robert > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> On Mon, Sep 16, 2013 at 2:26 PM, Bowen Zhang > > > >> >>>>> <bzh...@hortonworks.com<mailto:bzh...@hortonworks.com > ><mailto: > > > >> >>> bzh...@hortonworks.com<mailto:bzh...@hortonworks.com>>> wrote: > > > >> >>>>> > > > >> >>>>>> Hi Robert, > > > >> >>>>>> After rebasing to the trunk, I hit this error when trying to > > > bring > > > >> >>> up > > > >> >>>>>> oozie. > > > >> >>>>>> org.apache.oozie.service.ServiceException: E0100: Could not > > > >> >>> initialize > > > >> >>>>>> service [org.apache.oozie.service.ShareLibService], Failed to > > set > > > >> >>>>>> permissions of path: > > > >> >>> /user/bzhang/share/lib/tmp-20130916135406/oozie > > > >> >>> to > > > >> >>>>>> 0755 > > > >> >>>>>> at > > > >> >>> > > > org.apache.oozie.service.ShareLibService.init(ShareLibService.java:81) > > > >> >>>>>> at > > > >> >>> > > > >> > > org.apache.oozie.service.Services.setServiceInternal(Services.java:368) > > > >> >>>>>> at > > > >> >>>>>> > org.apache.oozie.service.Services.setService(Services.java:354) > > > >> >>>>>> at > > > >> >>>>>> > > org.apache.oozie.service.Services.loadServices(Services.java:287) > > > >> >>>>>> at > > > org.apache.oozie.service.Services.init(Services.java:208) > > > >> >>>>>> at > > > >> >>> > > > >> >>>>> > > > >> > > org.apache.oozie.servlet.ServicesLoader.contextInitialized(ServicesLoad > > > >> >>>>> er > > > >> >>>>>> .java:45) > > > >> >>>>>> at > > > >> >>> > > > >> >>>>> > > > >> > > org.apache.catalina.core.StandardContext.listenerStart(StandardContext. > > > >> >>>>> ja > > > >> >>>>>> va:4206) > > > >> >>>>>> at > > > >> >>> > > > >> >>>>> > > > >> > > org.apache.catalina.core.StandardContext.start(StandardContext.java:470 > > > >> >>>>> 5) > > > >> >>>>>> at > > > >> >>> > > > >> >>>>> > > > >> > > org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.j > > > >> >>>>> av > > > >> >>>>>> a:799) > > > >> >>>>>> at > > > >> >>> > > > >> > > org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779) > > > >> >>>>>> at > > > >> >>> > > > org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601) > > > >> >>>>>> at > > > >> >>> > > > org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943) > > > >> >>>>>> at > > > >> >>> > > > org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778) > > > >> >>>>>> at > > > >> >>> > > > org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504) > > > >> >>>>>> at > > > >> >>>>>> > > > org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317) > > > >> >>>>>> at > > > >> >>> > > > >> >>>>> > > > >> > > org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:3 > > > >> >>>>> 24 > > > >> >>>>>> ) > > > >> >>>>>> at > > > >> >>> > > > >> >>>>> > > > >> > > org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleS > > > >> >>>>> up > > > >> >>>>>> port.java:142) > > > >> >>>>>> at > > > >> >>> > > > org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065) > > > >> >>>>>> at > > > >> >>>>>> > > > org.apache.catalina.core.StandardHost.start(StandardHost.java:840) > > > >> >>>>>> at > > > >> >>> > > > org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057) > > > >> >>>>>> at > > > >> >>> > > > org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463) > > > >> >>>>>> at > > > >> >>> > > > >> > > org.apache.catalina.core.StandardService.start(StandardService.java:525) > > > >> >>>>>> at > > > >> >>> > > > org.apache.catalina.core.StandardServer.start(StandardServer.java:754) > > > >> >>>>>> at > > > >> >>> org.apache.catalina.startup.Catalina.start(Catalina.java:595) > > > >> >>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > > > >> >>> Method) > > > >> >>>>>> at > > > >> >>> > > > >> >>>>> > > > >> > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.ja > > > >> >>>>> va > > > >> >>>>>> :39) > > > >> >>>>>> at > > > >> >>> > > > >> >>>>> > > > >> > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccesso > > > >> >>>>> rI > > > >> >>>>>> mpl.java:25) > > > >> >>>>>> at java.lang.reflect.Method.invoke(Method.java:597) > > > >> >>>>>> at > > > >> >>>>>> > org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) > > > >> >>>>>> at > > > >> >>> org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) > > > >> >>>>>> Caused by: java.io.IOException: Failed to set permissions of > > > path: > > > >> >>>>>> /user/bzhang/share/lib/tmp-20130916135406/oozie to 0755 > > > >> >>>>>> at > > > >> >>>>>> > > org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689) > > > >> >>>>>> at > > > >> >>> org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662) > > > >> >>>>>> at > > > >> >>> > > > >> >>>>> > > > >> > > org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSyste > > > >> >>>>> m. > > > >> >>>>>> java:509) > > > >> >>>>>> at > > > >> >>> > > > >> >>>>> > > > >> > > org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.ja > > > >> >>>>> va > > > >> >>>>>> :286) > > > >> >>>>>> at > > > >> >>> > > > >> >>>>> > > > >> > > org.apache.oozie.service.ShareLibService.copyJarContainingClasses(Share > > > >> >>>>> Li > > > >> >>>>>> bService.java:109) > > > >> >>>>>> at > > > >> >>> > > > >> >>>>> > > > >> > > org.apache.oozie.service.ShareLibService.copyLauncherJarsToShareLib(Sha > > > >> >>>>> re > > > >> >>>>>> LibService.java:90) > > > >> >>>>>> at > > > >> >>> > > > org.apache.oozie.service.ShareLibService.init(ShareLibService.java:76) > > > >> >>>>>> ... 29 more > > > >> >>>>>> > > > >> >>>>>> > > > >> >>>>>> It might have sth to do with the version of hadoop that > > > >> >>> hadoopAccessor > > > >> >>>>>> cannot change file permission. Do you see this problem? > > > >> >>>>>> > > > >> >>>>>> > > > >> >>>>>> CONFIDENTIALITY NOTICE > > > >> >>>>>> NOTICE: This message is intended for the use of the > individual > > or > > > >> >>> entity > > > >> >>>>>> to which it is addressed and may contain information that is > > > >> >>>>>> confidential, > > > >> >>>>>> privileged and exempt from disclosure under applicable law. > If > > > the > > > >> >>>>>> reader > > > >> >>>>>> of this message is not the intended recipient, you are hereby > > > >> >>> notified > > > >> >>>>>> that > > > >> >>>>>> any printing, copying, dissemination, distribution, > disclosure > > or > > > >> >>>>>> forwarding of this communication is strictly prohibited. If > you > > > >> >>> have > > > >> >>>>>> received this communication in error, please contact the > sender > > > >> >>>>>> immediately > > > >> >>>>>> and delete it from your system. Thank You. > > > >> >> > > > >> >> > > > >> >> -- > > > >> >> Alejandro > > > >> > > > > >> > > > > > > > > > > > > > > > > > -- > Alejandro >