Hi Attila: The values of CallableQueueServices are default in my oozie-site. I will try to increase them and see the result. Thank you very much.
Best Regards, Dong Ying 2017-06-08 1:18 GMT+08:00 Attila Sasvari <asasv...@cloudera.com>: > In the Oozie book (Apache Oozie: The Workflow Scheduler for Hadoop by > Mohammad Kamrul Islam & Aravind Srinivasan) there are some hints on server > tuning (see Chapter 11 Oozie operations / Service settings / The > CallableQueueService). > > Default settings for the CallableQueueService are quite conservative. If > you increase oozie.service.CallableQueueService.threads, then change > oozie.service.CallableQueueService.callable.concurrency accordingly, and > consider increasing the Oozie server’s VM heap size. For example if you > bump oozie.service.CallableQueueService.threads to 100, set > oozie.service.CallableQueueService.callable.concurrency to 30. You can > also > adjust oozie.service.CallableQueueService.queue.size. > > However, finding optimal settings for your Oozie server really depends on > your environment (e.g. hardware size, resources, server capacity) and > workflow charateristics. > > Hope this helps, > Attila > > On Wed, Jun 7, 2017 at 6:38 AM, Dongying Jiao <pineapple...@gmail.com> > wrote: > > > Hi Andras and Attila: > > Thanks for your advice. > > I will check the cluster utility when this job runs next time, but I find > > some warning in oozie.log: > > > > 2017-06-05 02:18:18,952 WARN CallableQueueService:523 - SERVER[ > > 363748lpp2mn006.geicoddc.net] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] > > ACTION[-] max concurrency for callable [switch] exceeded, requeueing with > > [500]ms delay > > > > 2017-06-05 02:18:38,433 WARN CallableQueueService:523 - SERVER[ > > 363748lpp2mn006.geicoddc.net] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] > > ACTION[-] max concurrency for callable [#composite#job.notification] > > exceeded, requeueing with [500]ms delay > > > > Does it mean I should increase oozie.service. > > CallableQueueService.callable. > > concurrency? > > > > BTW, I am using Oozie 4.2.0. > > > > Thanks > > > > > > 2017-06-06 21:04 GMT+08:00 Attila Sasvari <asasv...@cloudera.com>: > > > > > Hi Dong Ying, > > > > > > Many thanks Andras, these are good ideas. > > > > > > In addition, can you confirm that you have enough vcores / memory in > your > > > cluster for containers? > > > > > > You can check and try to adjust the following YARN settings: > > > - yarn.nodemanager.resource.cpu-vcores > > > - yarn.nodemanager.resource.memory-mb > > > (look at your yarn-site.xml / yarn-default.xml) > > > > > > Also I would also recommend check overall cluster utilization when > Oozie > > > jobs get into PREP state. Are there a lot of running jobs using a lot > of > > > resources (vcores, memory) at the time when your coordinator tries to > > > submit the job? You can look at resource manager and history server. > Hope > > > this helps. > > > > > > Best, > > > - Attila > > > > > > * yarn settings - > > > https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/ > > hadoop-yarn-common/yarn- > > > default.xml > > > > > > > > > > > > > > > On Tue, Jun 6, 2017 at 2:26 PM, Andras Piros < > andras.pi...@cloudera.com> > > > wrote: > > > > > > > Hi Dong Ying, > > > > > > > > do you see any logs having this snippet queue is full within the > Oozie > > > > webapp logs? > > > > > > > > What are the values of these parameters: > > > > > > > > - > > > > > > > > oozie.service.CallableQueueService.queue.size > > > > > > > > - > > > > > > > > oozie.service.CallableQueueService.threads > > > > > > > > - > > > > > > > > oozie.service.CallableQueueService.callable.concurrency > > > > > > > > > > > > Regards, > > > > > > > > Andras > > > > > > > > On Tue, Jun 6, 2017 at 9:04 AM, Dongying Jiao < > pineapple...@gmail.com> > > > > wrote: > > > > > > > > > Hi: > > > > > I have a oozie coordinator job run at 02:00 o'clock everyday, > > > sometimes, > > > > > the job can run smoothly, but sometimes, the job is stuck in PREP > > state > > > > for > > > > > a long time. > > > > > > > > > > This is my part of my coordinator.xml: > > > > > <coordinator-app name="CoordinatorForETL" > > > > > frequency="${coordinatorFrequency}" > > > > > start="${startTime}" end="${endTime}" timezone="America/New_York" > > > > > xmlns="uri:oozie:coordinator:0.2"> > > > > > <controls> > > > > > <timeout>10</timeout> > > > > > <concurrency>1</concurrency> > > > > > </controls> > > > > > <action> > > > > > <workflow> > > > > > ............. > > > > > This is part of the workflow.xml: > > > > > ...... > > > > > <start to="flowDecision"/> > > > > > <decision name="flowDecision"> > > > > > <switch> > > > > > <case to="q1">${workflowType eq "etl" || workflowType eq > > > > "all"}</case> > > > > > <case to="prediction">${workflowType eq "prediction"}</case> > > > > > <case to="errorOnDecision">${workflowType eq > "cleaning"}</case> > > > > > <default to="errorOnDecision"/> > > > > > </switch> > > > > > </decision> > > > > > ....... > > > > > > > > > > From my latest run, the job in PREP state for about 30 min. From > > oozie > > > > log, > > > > > the "start" node of the job is done at 02:00, but until 02:32, the > > > > > "flowDecision" node started to execute. During that period, I can > see > > > > other > > > > > oozie jobs are running from log, but didn't find any error or > > exception > > > > in > > > > > log. > > > > > > > > > > From my understanding, oozie job in PREP state means the job is not > > > > > submitted to yarn yet, so can't find application id on yarn. > > > > > I wonder if this relates to oozie queue mechanism or concurrency > > > control. > > > > > If yes, do you have experience on how to tune them? > > > > > > > > > > Thanks a lot. > > > > > > > > > > Best Regards, > > > > > Dong Ying > > > > > > > > > > > > > > >