Update:
Seems like when I increase the resources allocated to airflow (both CPU and
memory), its performance does improve.
If this can be helpful to someone: We have now allocated 2 cores and 20g
allocated for airflow (and we currently run about 1000 tasks every hour. We
will run more in the future. So performance may still be an issue in the
future).
The numbers used to be 1 core, 8g when the performance (queued -> running)
was very slow.


Thanks,
Harish

On Wed, Dec 7, 2016 at 2:31 PM, harish singh <[email protected]>
wrote:

> Any pointers on this?
> How to move the jobs from Queued to Running faster?
>
> Thanks,
> Harish
>
> On Tue, Dec 6, 2016 at 2:01 PM, harish singh <[email protected]>
> wrote:
>
>> Hi Guys,
>>
>> Doing a month backfill for all the pipelines has brought up some issues,
>> which we may not have noticed before.
>>
>> One of the issues I am seeing is:
>> We use airflow pools.
>> From what I currently see in the UI,  we have a pool named, say, "pool_1"
>> which has "Queued Slots" = 30
>> and Used Slots = 5.
>> Also, total available Slots = 30.
>> So this means, that next time the scheduler heartbeats,  atleast 25 tasks
>> should be moved to occupy the "Unused Slots", right?
>>
>> The heartbeats have been set to very low.
>> job_heartbeat_sec = 2
>> scheduler_heartbeat_sec = 2
>>
>> Originally, I had them both at 10 sec. But I am kinda irritated on how slow 
>> things have been.
>>
>> Strictly speaking from a scheduler view, the scheduling should move the jobs 
>> from
>> "Queued" to "Running" (and occupy a "Used" slot)  in every 2 seconds 
>> (scheduler_heartbeat_sec).
>>
>>
>> This are the parallelism numbers I am using:
>>
>> parallelism = 64
>> dag_concurrency = 64
>> max_active_runs_per_dag = 16
>>
>> I have not seen 64 tasks running at the sametime yet, although I have
>> seen around 40-50 being in "Queued" state. But they just not rollover to
>> "running" when the next heartbeat arrives.
>>
>>
>> There are around 10 hourly pipelines each with around 15 tasks.
>> It is progressing at a pace of 600 tasks per hour.
>> I would totally want to get this number to 60,000/hour.
>> Was hoping to complete the backfill within a day or two. But I think this is 
>> going to take a week.
>>
>>
>> I looked at backend services:
>> They are mostly sitting idle for minutes (sometimes 5 minutes)
>> before they get a request.
>>
>> I am not sure if my configurations are right.
>> Has someone faced this before? Any suggestions for me?
>>
>> Currently, one of the bottlenecks I am observing is the time taken from 
>> moving a task
>> from "Queued" -> "Used"  stage (in the pool page of UI).
>>
>>
>> Thanks,
>> Harish
>>
>>
>>
>>
>>
>

Reply via email to