Hi Selina, I suspect that the problem is due to the following job parameter: driver_jvm_args -Xmx4g
This would certainly be true if cgroups have been configured on for DUCC. The default cgroup size for a JD is 450MB, so specifying an Xmx of 4GB can cause the JVM to spill into swap space and cause erratic behavior. Comparing a "fast" job (96) vs "slow" job (97), the time to process the single work item was 8 sec vs 9 sec: 09 Mar 2016 08:46:08,556 INFO JobDriverHelper - T[20] summarize workitem statistics [sec] avg=8.14 min=8.14 max=8.14 stddev=.00 vs 09 Mar 2016 08:56:46,583 INFO JobDriverHelper - T[19] summarize workitem statistics [sec] avg=9.41 min=9.41 max=9.41 stddev=.00 The extra delays between the two jobs appear associated with the Job Driver. Was there some reason you specified heap size for the JD? The default JD heap size is Xmx400m. Regards, Eddie On Wed, Mar 9, 2016 at 2:41 PM, Selina Chu <selina....@gmail.com> wrote: > Hi > > I’m kind of new to DUCC and this forum. I was hoping to see if someone > could give me some insights as to why DUCC is behaving strangely and a bit > unstable. > > So what I'm trying to do is: I’m using DUCC to process a cTAKES job. > Currently DUCC is just using a single node. DUCC seems to act randomly in > processing the jobs, varying between 4.5 minutes to 23 minutes, and I > wasn’t running anything else that is CPU intensive. When I don’t use DUCC > and use cTAKES alone, the times for processing are pretty consistent. > > To demonstrate this strange behavior in DUCC, I submitted the exact same > job 10 times in a row (job ID 95-104), without modification to the > settings. > The duration for finishing each of the jobs are: 4:41, 4:43, 12:48, 8:41, > 5:24, 4:38, 7:07, 23:08, 8:08, 20:37 (canceled by system). The first 9 jobs > were completed and the last one got canceled. Even before the last job, > the first 9 jobs were varying in duration times. > After restarting DUCC a couple of times and resetting it, I submitted the > same job (job ID 110), that job was completed without a problem (long > processing time) > > I noticed that when a job takes a long time to finish, past 5 minutes, it > seemed to be stuck at the “initializing” and “completing” states for the > longest. > > It seems like DUCC is doing something randomly. I tried examining the log > files, but they are all similar, except for the time between each state. > (I’ve also placed the related logs and job file in a repo > https://github.com/selinachu/Templogs, in case anyone is interested in > examining them.) > > I’m baffled with the random behaviors from DUCC. I was hoping maybe someone > could clarify this more for me. > > After completing a job, what does DUCC do? Does it save something in > memory, which carries over to the next job, which probably relates to the > initialization process? Are there some parameter settings that might > alleviate this type of behavior? > > I would appreciate any insight. Thanks in advance for your help. > > > Cheers, > Selina Chu >