Actually, ignore my previous comment. It is set to true for the image noticed the issue with..
I will test out with turning it ‘false’ and get back with findings.. Thanks much ! _Sham > On Jun 9, 2016, at 12:22 PM, Shyam Patel <sham.pate...@gmail.com> wrote: > > Actually, checking on the flag ‘use_beta_db_task' is false (default).. > > INFO: use_beta_db_task_store > (org.apache.aurora.scheduler.storage.db.DbModule.use_beta_db_task_store): > false > Jun 09, 2016 7:11:20 PM org.apache.aurora.common.args.ArgScanner process > > > _Shyam > > > >> On Jun 9, 2016, at 10:06 AM, Maxim Khutornenko <ma...@apache.org >> <mailto:ma...@apache.org>> wrote: >> >> Scheduler persists its state in the Mesos replicated log regardless of >> the in-memory engine. If you change the flag and restart scheduler all >> tasks are going to be re-inserted into MemTaskStore instead of >> DBTaskStore. No data will be lost. >> >> On Thu, Jun 9, 2016 at 9:55 AM, Shyam Patel <sham.pate...@gmail.com >> <mailto:sham.pate...@gmail.com>> wrote: >>> Thanks Maxim, >>> >>> If we move to mem task store, restart of aurora would lose the data ? (btw, >>> I’m running aurora in a container) >>> >>> >>> >>>> On Jun 9, 2016, at 8:37 AM, Maxim Khutornenko <ma...@apache.org >>>> <mailto:ma...@apache.org>> wrote: >>>> >>>> There are plenty of factors that may contribute towards the behavior >>>> you're observing. Based on the logs though it appears you are using >>>> DBTaskStore (-use_beta_db_task_store=true)? If so, you may want to >>>> revert to the default in-mem task store >>>> (-use_beta_db_task_store=false) as DBTaskStore is known to perform >>>> subpar on large task counts. This is a known issue and we plan to >>>> invest into making it faster. >>>> >>>> On Thu, Jun 9, 2016 at 6:58 AM, Erb, Stephan >>>> <stephan....@blue-yonder.com <mailto:stephan....@blue-yonder.com>> wrote: >>>>> I am no expert here, but I would assume that slow task store operations >>>>> could result from a slow replicated log. Have you tried keeping it on an >>>>> SSD? >>>>> (https://github.com/apache/aurora/blob/e89521f1eebd9a5301eb02e2ed6ffebdecd54c9a/docs/operations/configuration.md#-native_log_file_path >>>>> >>>>> <https://github.com/apache/aurora/blob/e89521f1eebd9a5301eb02e2ed6ffebdecd54c9a/docs/operations/configuration.md#-native_log_file_path>) >>>>> >>>>> FWIW, there was a recent RB by Maxim to reduce Master load unter task >>>>> reconciliation: https://reviews.apache.org/r/47373/diff/2#index_header >>>>> <https://reviews.apache.org/r/47373/diff/2#index_header> >>>>> ________________________________________ >>>>> From: Shyam Patel <sham.pate...@gmail.com <mailto:sham.pate...@gmail.com>> >>>>> Sent: Thursday, June 9, 2016 07:48 >>>>> To: dev@aurora.apache.org <mailto:dev@aurora.apache.org> >>>>> Subject: Re: Aurora performance impact with hourly query runs >>>>> >>>>> Hi Bill, >>>>> >>>>> Cluster Set up : AWS >>>>> >>>>> 1 Mesos , 1 ZK , 1 Aurora instance : 4 CPU, 16G mem >>>>> >>>>> Aurora : Xmx 14G >>>>> >>>>> 100 nodes agent cluster : 40 CPU, 160G mem each >>>>> >>>>> 8000 Jobs, each with 2 instances. So, total ~16K containers >>>>> >>>>> >>>>> Thanks, >>>>> Sham >>>>> >>>>> >>>>> >>>>>> On Jun 8, 2016, at 9:18 PM, Bill Farner <wfar...@apache.org >>>>>> <mailto:wfar...@apache.org>> wrote: >>>>>> >>>>>> Can you give some insight into the machine specs and JVM options used? >>>>>> >>>>>> Also, is it 8000 jobs or tasks? The terms are often mixed up, but will >>>>>> have a big difference here. >>>>>> >>>>>> On Wednesday, June 8, 2016, Shyam Patel <sham.pate...@gmail.com >>>>>> <mailto:sham.pate...@gmail.com>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> While running LnP testing, I’m spinning of 8K docker jobs. During the >>>>>>> run, >>>>>>> I ran into issue where TaskStatUpdate and TaskReconciler queries taking >>>>>>> real long times. During the time, Aurora is pretty much freezing and at >>>>>>> a >>>>>>> point dying. Also, tried the same run w/o the docker jobs and faced the >>>>>>> same issue. >>>>>>> >>>>>>> >>>>>>> Is there a way to keep the Aurora performance intact during the query >>>>>>> runs >>>>>>> ? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Here is snipped from log : >>>>>>> >>>>>>> >>>>>>> I0602 00:53:37.527 [TaskStatUpdaterService RUNNING, DbTaskStore:104] >>>>>>> Query >>>>>>> took 1243517 ms: TaskQuery(owner:null, role:null, environment:null, >>>>>>> jobName:null, taskIds:null, statuses:[STARTING, THROTTLED, RUNNING, >>>>>>> DRAINING, ASSIGNED, KILLING, RESTARTING, PENDING, PREEMPTING], >>>>>>> instanceIds:null, slaveHosts:null, jobKeys:null, offset:0, limit:0) >>>>>>> >>>>>>> >>>>>>> I0602 00:56:54.180 [TaskReconciler-0, DbTaskStore:104] Query took >>>>>>> 1380169 >>>>>>> ms: TaskQuery(owner:null, role:null, environment:null, jobName:null, >>>>>>> taskIds:null, statuses:[STARTING, RUNNING, DRAINING, ASSIGNED, KILLING, >>>>>>> RESTARTING, PREEMPTING], instanceIds:null, slaveHosts:null, >>>>>>> jobKeys:null, >>>>>>> offset:0, limit:0) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Appreciate any insights.. >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Sham >>>>>>> >>>>>>> >>> >