Hi Bill, Cluster Set up : AWS
1 Mesos , 1 ZK , 1 Aurora instance : 4 CPU, 16G mem Aurora : Xmx 14G 100 nodes agent cluster : 40 CPU, 160G mem each 8000 Jobs, each with 2 instances. So, total ~16K containers Thanks, Sham > On Jun 8, 2016, at 9:18 PM, Bill Farner <wfar...@apache.org> wrote: > > Can you give some insight into the machine specs and JVM options used? > > Also, is it 8000 jobs or tasks? The terms are often mixed up, but will > have a big difference here. > > On Wednesday, June 8, 2016, Shyam Patel <sham.pate...@gmail.com> wrote: > >> Hi, >> >> While running LnP testing, I’m spinning of 8K docker jobs. During the run, >> I ran into issue where TaskStatUpdate and TaskReconciler queries taking >> real long times. During the time, Aurora is pretty much freezing and at a >> point dying. Also, tried the same run w/o the docker jobs and faced the >> same issue. >> >> >> Is there a way to keep the Aurora performance intact during the query runs >> ? >> >> >> >> Here is snipped from log : >> >> >> I0602 00:53:37.527 [TaskStatUpdaterService RUNNING, DbTaskStore:104] Query >> took 1243517 ms: TaskQuery(owner:null, role:null, environment:null, >> jobName:null, taskIds:null, statuses:[STARTING, THROTTLED, RUNNING, >> DRAINING, ASSIGNED, KILLING, RESTARTING, PENDING, PREEMPTING], >> instanceIds:null, slaveHosts:null, jobKeys:null, offset:0, limit:0) >> >> >> I0602 00:56:54.180 [TaskReconciler-0, DbTaskStore:104] Query took 1380169 >> ms: TaskQuery(owner:null, role:null, environment:null, jobName:null, >> taskIds:null, statuses:[STARTING, RUNNING, DRAINING, ASSIGNED, KILLING, >> RESTARTING, PREEMPTING], instanceIds:null, slaveHosts:null, jobKeys:null, >> offset:0, limit:0) >> >> >> >> Appreciate any insights.. >> >> >> Thanks, >> Sham >> >>