Hi Bill,

Cluster Set up : AWS

1 Mesos , 1 ZK , 1 Aurora instance : 4 CPU, 16G mem

Aurora : Xmx 14G

100 nodes agent cluster : 40 CPU, 160G mem each

8000 Jobs, each with 2 instances. So, total ~16K containers


Thanks,
Sham



> On Jun 8, 2016, at 9:18 PM, Bill Farner <wfar...@apache.org> wrote:
> 
> Can you give some insight into the machine specs and JVM options used?
> 
> Also, is it 8000 jobs or tasks?  The terms are often mixed up, but will
> have a big difference here.
> 
> On Wednesday, June 8, 2016, Shyam Patel <sham.pate...@gmail.com> wrote:
> 
>> Hi,
>> 
>> While running LnP testing, I’m spinning of 8K docker jobs. During the run,
>> I ran into issue where TaskStatUpdate and TaskReconciler queries taking
>> real long times. During the time, Aurora is pretty much freezing and at a
>> point dying.  Also, tried the same run w/o the docker jobs and faced the
>> same issue.
>> 
>> 
>> Is there a way to keep the Aurora performance intact during the query runs
>> ?
>> 
>> 
>> 
>> Here is snipped from log :
>> 
>> 
>> I0602 00:53:37.527 [TaskStatUpdaterService RUNNING, DbTaskStore:104] Query
>> took 1243517 ms: TaskQuery(owner:null, role:null, environment:null,
>> jobName:null, taskIds:null, statuses:[STARTING, THROTTLED, RUNNING,
>> DRAINING, ASSIGNED, KILLING, RESTARTING, PENDING, PREEMPTING],
>> instanceIds:null, slaveHosts:null, jobKeys:null, offset:0, limit:0)
>> 
>> 
>> I0602 00:56:54.180 [TaskReconciler-0, DbTaskStore:104] Query took 1380169
>> ms: TaskQuery(owner:null, role:null, environment:null, jobName:null,
>> taskIds:null, statuses:[STARTING, RUNNING, DRAINING, ASSIGNED, KILLING,
>> RESTARTING, PREEMPTING], instanceIds:null, slaveHosts:null, jobKeys:null,
>> offset:0, limit:0)
>> 
>> 
>> 
>> Appreciate any insights..
>> 
>> 
>> Thanks,
>> Sham
>> 
>> 

Reply via email to