Re: Aurora performance impact with hourly query runs

Shyam Patel Thu, 09 Jun 2016 09:56:04 -0700

Thanks Maxim,

If we move to mem task store, restart of aurora would lose the data ? (btw, I’m 
running aurora in a container)




> On Jun 9, 2016, at 8:37 AM, Maxim Khutornenko <[email protected]> wrote:
> 
> There are plenty of factors that may contribute towards the behavior
> you're observing. Based on the logs though it appears you are using
> DBTaskStore (-use_beta_db_task_store=true)? If so, you may want to
> revert to the default in-mem task store
> (-use_beta_db_task_store=false) as DBTaskStore is known to perform
> subpar on large task counts. This is a known issue and we plan to
> invest into making it faster.
> 
> On Thu, Jun 9, 2016 at 6:58 AM, Erb, Stephan
> <[email protected]> wrote:
>> I am no expert here, but I would assume that slow task store operations 
>> could result from a slow replicated log. Have you tried keeping it on an 
>> SSD? 
>> (https://github.com/apache/aurora/blob/e89521f1eebd9a5301eb02e2ed6ffebdecd54c9a/docs/operations/configuration.md#-native_log_file_path)
>> 
>> FWIW, there was a recent RB by Maxim to reduce Master load unter task 
>> reconciliation: https://reviews.apache.org/r/47373/diff/2#index_header
>> ________________________________________
>> From: Shyam Patel <[email protected]>
>> Sent: Thursday, June 9, 2016 07:48
>> To: [email protected]
>> Subject: Re: Aurora performance impact with hourly query runs
>> 
>> Hi Bill,
>> 
>> Cluster Set up : AWS
>> 
>> 1 Mesos , 1 ZK , 1 Aurora instance : 4 CPU, 16G mem
>> 
>> Aurora : Xmx 14G
>> 
>> 100 nodes agent cluster : 40 CPU, 160G mem each
>> 
>> 8000 Jobs, each with 2 instances. So, total ~16K containers
>> 
>> 
>> Thanks,
>> Sham
>> 
>> 
>> 
>>> On Jun 8, 2016, at 9:18 PM, Bill Farner <[email protected]> wrote:
>>> 
>>> Can you give some insight into the machine specs and JVM options used?
>>> 
>>> Also, is it 8000 jobs or tasks?  The terms are often mixed up, but will
>>> have a big difference here.
>>> 
>>> On Wednesday, June 8, 2016, Shyam Patel <[email protected]> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> While running LnP testing, I’m spinning of 8K docker jobs. During the run,
>>>> I ran into issue where TaskStatUpdate and TaskReconciler queries taking
>>>> real long times. During the time, Aurora is pretty much freezing and at a
>>>> point dying.  Also, tried the same run w/o the docker jobs and faced the
>>>> same issue.
>>>> 
>>>> 
>>>> Is there a way to keep the Aurora performance intact during the query runs
>>>> ?
>>>> 
>>>> 
>>>> 
>>>> Here is snipped from log :
>>>> 
>>>> 
>>>> I0602 00:53:37.527 [TaskStatUpdaterService RUNNING, DbTaskStore:104] Query
>>>> took 1243517 ms: TaskQuery(owner:null, role:null, environment:null,
>>>> jobName:null, taskIds:null, statuses:[STARTING, THROTTLED, RUNNING,
>>>> DRAINING, ASSIGNED, KILLING, RESTARTING, PENDING, PREEMPTING],
>>>> instanceIds:null, slaveHosts:null, jobKeys:null, offset:0, limit:0)
>>>> 
>>>> 
>>>> I0602 00:56:54.180 [TaskReconciler-0, DbTaskStore:104] Query took 1380169
>>>> ms: TaskQuery(owner:null, role:null, environment:null, jobName:null,
>>>> taskIds:null, statuses:[STARTING, RUNNING, DRAINING, ASSIGNED, KILLING,
>>>> RESTARTING, PREEMPTING], instanceIds:null, slaveHosts:null, jobKeys:null,
>>>> offset:0, limit:0)
>>>> 
>>>> 
>>>> 
>>>> Appreciate any insights..
>>>> 
>>>> 
>>>> Thanks,
>>>> Sham
>>>> 
>>>>

Re: Aurora performance impact with hourly query runs

Reply via email to