Re: Oozie job in PREP state for a long time

Dongying Jiao Thu, 08 Jun 2017 02:19:18 -0700

Hi Attila:
The values of CallableQueueServices are default in my oozie-site. I will
try to increase them and see the result.
Thank you very much.


Best Regards,
Dong Ying

2017-06-08 1:18 GMT+08:00 Attila Sasvari <asasv...@cloudera.com>:

>  In the Oozie book (Apache Oozie: The Workflow Scheduler for Hadoop by
> Mohammad Kamrul Islam & Aravind Srinivasan) there are some hints on server
> tuning (see Chapter 11 Oozie operations / Service settings / The
> CallableQueueService).
>
> Default settings for the CallableQueueService are quite conservative. If
> you increase oozie.service.CallableQueueService.threads, then change
> oozie.service.CallableQueueService.callable.concurrency accordingly, and
> consider increasing the Oozie server’s VM heap size. For example if you
> bump oozie.service.CallableQueueService.threads to 100, set
> oozie.service.CallableQueueService.callable.concurrency to 30. You can
> also
> adjust oozie.service.CallableQueueService.queue.size.
>
> However, finding optimal settings for your Oozie server really depends on
> your environment (e.g. hardware size, resources, server capacity) and
> workflow charateristics.
>
> Hope this helps,
> Attila
>
> On Wed, Jun 7, 2017 at 6:38 AM, Dongying Jiao <pineapple...@gmail.com>
> wrote:
>
> > Hi Andras and Attila:
> > Thanks for your advice.
> > I will check the cluster utility when this job runs next time, but I find
> > some warning in oozie.log:
> >
> > 2017-06-05 02:18:18,952  WARN CallableQueueService:523 - SERVER[
> > 363748lpp2mn006.geicoddc.net] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-]
> > ACTION[-] max concurrency for callable [switch] exceeded, requeueing with
> > [500]ms delay
> >
> > 2017-06-05 02:18:38,433  WARN CallableQueueService:523 - SERVER[
> > 363748lpp2mn006.geicoddc.net] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-]
> > ACTION[-] max concurrency for callable [#composite#job.notification]
> > exceeded, requeueing with [500]ms delay
> >
> > Does it mean I should increase oozie.service.
> > CallableQueueService.callable.
> > concurrency?
> >
> > BTW, I am using Oozie 4.2.0.
> >
> > Thanks
> >
> >
> > 2017-06-06 21:04 GMT+08:00 Attila Sasvari <asasv...@cloudera.com>:
> >
> > > Hi Dong Ying,
> > >
> > > Many thanks Andras, these are good ideas.
> > >
> > > In addition, can you confirm that you have enough vcores / memory in
> your
> > > cluster for containers?
> > >
> > > You can check and try to adjust the following YARN settings:
> > > - yarn.nodemanager.resource.cpu-vcores
> > > - yarn.nodemanager.resource.memory-mb
> > >  (look at your yarn-site.xml / yarn-default.xml)
> > >
> > > Also I would also recommend check overall cluster utilization when
> Oozie
> > > jobs get into PREP state. Are there a lot of running jobs using a lot
> of
> > > resources (vcores, memory) at the time when your coordinator tries to
> > > submit the job? You can look at resource manager and history server.
> Hope
> > > this helps.
> > >
> > > Best,
> > > - Attila
> > >
> > > * yarn settings -
> > > https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/
> > hadoop-yarn-common/yarn-
> > > default.xml
> > >
> > >
> > >
> > >
> > > On Tue, Jun 6, 2017 at 2:26 PM, Andras Piros <
> andras.pi...@cloudera.com>
> > > wrote:
> > >
> > > > Hi Dong Ying,
> > > >
> > > > do you see any logs having this snippet queue is full within the
> Oozie
> > > > webapp logs?
> > > >
> > > > What are the values of these parameters:
> > > >
> > > >    -
> > > >
> > > >    oozie.service.CallableQueueService.queue.size
> > > >
> > > >    -
> > > >
> > > >    oozie.service.CallableQueueService.threads
> > > >
> > > >    -
> > > >
> > > >    oozie.service.CallableQueueService.callable.concurrency
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Andras
> > > >
> > > > On Tue, Jun 6, 2017 at 9:04 AM, Dongying Jiao <
> pineapple...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi:
> > > > > I have a oozie coordinator job run at 02:00 o'clock everyday,
> > > sometimes,
> > > > > the job can run smoothly, but sometimes, the job is stuck in PREP
> > state
> > > > for
> > > > > a long time.
> > > > >
> > > > > This is my part of my coordinator.xml:
> > > > > <coordinator-app name="CoordinatorForETL"
> > > > >   frequency="${coordinatorFrequency}"
> > > > >   start="${startTime}" end="${endTime}" timezone="America/New_York"
> > > > >   xmlns="uri:oozie:coordinator:0.2">
> > > > >   <controls>
> > > > >     <timeout>10</timeout>
> > > > >     <concurrency>1</concurrency>
> > > > >   </controls>
> > > > >   <action>
> > > > >     <workflow>
> > > > > .............
> > > > > This is part of the workflow.xml:
> > > > > ......
> > > > > <start to="flowDecision"/>
> > > > >   <decision name="flowDecision">
> > > > >   <switch>
> > > > >     <case to="q1">${workflowType eq "etl" || workflowType eq
> > > > "all"}</case>
> > > > >     <case to="prediction">${workflowType eq "prediction"}</case>
> > > > >     <case to="errorOnDecision">${workflowType eq
> "cleaning"}</case>
> > > > >     <default to="errorOnDecision"/>
> > > > >   </switch>
> > > > >    </decision>
> > > > > .......
> > > > >
> > > > > From my latest run, the job in PREP state for about 30 min. From
> > oozie
> > > > log,
> > > > > the "start" node of the job is done at 02:00, but until 02:32, the
> > > > > "flowDecision" node started to execute. During that period, I can
> see
> > > > other
> > > > > oozie jobs are running from log, but didn't find any error or
> > exception
> > > > in
> > > > > log.
> > > > >
> > > > > From my understanding, oozie job in PREP state means the job is not
> > > > > submitted to yarn yet, so can't find application id on yarn.
> > > > > I wonder if this relates to oozie queue mechanism or concurrency
> > > control.
> > > > > If yes, do you have experience on how to tune them?
> > > > >
> > > > > Thanks a lot.
> > > > >
> > > > > Best Regards,
> > > > > Dong Ying
> > > > >
> > > >
> > >
> >
>

Re: Oozie job in PREP state for a long time

Reply via email to