MemTaskStore is the default On Sunday, June 12, 2016, <meghdoo...@yahoo.com.invalid> wrote:
> Yes Maxim really appreciate the tip. That's quiet a difference. > One follow up question, any reason of not making MemTaskStore the default > in aurora? > > Thx > > Sent from my iPhone > > > On Jun 12, 2016, at 9:48 AM, Shyam Patel <sham.pate...@gmail.com > <javascript:;>> wrote: > > > > The query performance improved drastically, It took only 29ms for 12K > jobs/30K tasks.. (from an hour !) > > > > Thanks Maxim for quick lead, really appreciate your help. > > > > > > > > Thanks, > > Sham > > > >> On Jun 9, 2016, at 10:06 AM, Maxim Khutornenko <ma...@apache.org > <javascript:;>> wrote: > >> > >> Scheduler persists its state in the Mesos replicated log regardless of > >> the in-memory engine. If you change the flag and restart scheduler all > >> tasks are going to be re-inserted into MemTaskStore instead of > >> DBTaskStore. No data will be lost. > >> > >>> On Thu, Jun 9, 2016 at 9:55 AM, Shyam Patel <sham.pate...@gmail.com > <javascript:;>> wrote: > >>> Thanks Maxim, > >>> > >>> If we move to mem task store, restart of aurora would lose the data ? > (btw, I’m running aurora in a container) > >>> > >>> > >>> > >>>> On Jun 9, 2016, at 8:37 AM, Maxim Khutornenko <ma...@apache.org > <javascript:;>> wrote: > >>>> > >>>> There are plenty of factors that may contribute towards the behavior > >>>> you're observing. Based on the logs though it appears you are using > >>>> DBTaskStore (-use_beta_db_task_store=true)? If so, you may want to > >>>> revert to the default in-mem task store > >>>> (-use_beta_db_task_store=false) as DBTaskStore is known to perform > >>>> subpar on large task counts. This is a known issue and we plan to > >>>> invest into making it faster. > >>>> > >>>> On Thu, Jun 9, 2016 at 6:58 AM, Erb, Stephan > >>>> <stephan....@blue-yonder.com <javascript:;>> wrote: > >>>>> I am no expert here, but I would assume that slow task store > operations could result from a slow replicated log. Have you tried keeping > it on an SSD? ( > https://github.com/apache/aurora/blob/e89521f1eebd9a5301eb02e2ed6ffebdecd54c9a/docs/operations/configuration.md#-native_log_file_path > ) > >>>>> > >>>>> FWIW, there was a recent RB by Maxim to reduce Master load unter > task reconciliation: > https://reviews.apache.org/r/47373/diff/2#index_header > >>>>> ________________________________________ > >>>>> From: Shyam Patel <sham.pate...@gmail.com <javascript:;>> > >>>>> Sent: Thursday, June 9, 2016 07:48 > >>>>> To: dev@aurora.apache.org <javascript:;> > >>>>> Subject: Re: Aurora performance impact with hourly query runs > >>>>> > >>>>> Hi Bill, > >>>>> > >>>>> Cluster Set up : AWS > >>>>> > >>>>> 1 Mesos , 1 ZK , 1 Aurora instance : 4 CPU, 16G mem > >>>>> > >>>>> Aurora : Xmx 14G > >>>>> > >>>>> 100 nodes agent cluster : 40 CPU, 160G mem each > >>>>> > >>>>> 8000 Jobs, each with 2 instances. So, total ~16K containers > >>>>> > >>>>> > >>>>> Thanks, > >>>>> Sham > >>>>> > >>>>> > >>>>> > >>>>>> On Jun 8, 2016, at 9:18 PM, Bill Farner <wfar...@apache.org > <javascript:;>> wrote: > >>>>>> > >>>>>> Can you give some insight into the machine specs and JVM options > used? > >>>>>> > >>>>>> Also, is it 8000 jobs or tasks? The terms are often mixed up, but > will > >>>>>> have a big difference here. > >>>>>> > >>>>>>> On Wednesday, June 8, 2016, Shyam Patel <sham.pate...@gmail.com > <javascript:;>> wrote: > >>>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> While running LnP testing, I’m spinning of 8K docker jobs. During > the run, > >>>>>>> I ran into issue where TaskStatUpdate and TaskReconciler queries > taking > >>>>>>> real long times. During the time, Aurora is pretty much freezing > and at a > >>>>>>> point dying. Also, tried the same run w/o the docker jobs and > faced the > >>>>>>> same issue. > >>>>>>> > >>>>>>> > >>>>>>> Is there a way to keep the Aurora performance intact during the > query runs > >>>>>>> ? > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Here is snipped from log : > >>>>>>> > >>>>>>> > >>>>>>> I0602 00:53:37.527 [TaskStatUpdaterService RUNNING, > DbTaskStore:104] Query > >>>>>>> took 1243517 ms: TaskQuery(owner:null, role:null, environment:null, > >>>>>>> jobName:null, taskIds:null, statuses:[STARTING, THROTTLED, RUNNING, > >>>>>>> DRAINING, ASSIGNED, KILLING, RESTARTING, PENDING, PREEMPTING], > >>>>>>> instanceIds:null, slaveHosts:null, jobKeys:null, offset:0, limit:0) > >>>>>>> > >>>>>>> > >>>>>>> I0602 00:56:54.180 [TaskReconciler-0, DbTaskStore:104] Query took > 1380169 > >>>>>>> ms: TaskQuery(owner:null, role:null, environment:null, > jobName:null, > >>>>>>> taskIds:null, statuses:[STARTING, RUNNING, DRAINING, ASSIGNED, > KILLING, > >>>>>>> RESTARTING, PREEMPTING], instanceIds:null, slaveHosts:null, > jobKeys:null, > >>>>>>> offset:0, limit:0) > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Appreciate any insights.. > >>>>>>> > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Sham > > >