Any pointers on this? How to move the jobs from Queued to Running faster? Thanks, Harish
On Tue, Dec 6, 2016 at 2:01 PM, harish singh <[email protected]> wrote: > Hi Guys, > > Doing a month backfill for all the pipelines has brought up some issues, > which we may not have noticed before. > > One of the issues I am seeing is: > We use airflow pools. > From what I currently see in the UI, we have a pool named, say, "pool_1" > which has "Queued Slots" = 30 > and Used Slots = 5. > Also, total available Slots = 30. > So this means, that next time the scheduler heartbeats, atleast 25 tasks > should be moved to occupy the "Unused Slots", right? > > The heartbeats have been set to very low. > job_heartbeat_sec = 2 > scheduler_heartbeat_sec = 2 > > Originally, I had them both at 10 sec. But I am kinda irritated on how slow > things have been. > > Strictly speaking from a scheduler view, the scheduling should move the jobs > from > "Queued" to "Running" (and occupy a "Used" slot) in every 2 seconds > (scheduler_heartbeat_sec). > > > This are the parallelism numbers I am using: > > parallelism = 64 > dag_concurrency = 64 > max_active_runs_per_dag = 16 > > I have not seen 64 tasks running at the sametime yet, although I have seen > around 40-50 being in "Queued" state. But they just not rollover to > "running" when the next heartbeat arrives. > > > There are around 10 hourly pipelines each with around 15 tasks. > It is progressing at a pace of 600 tasks per hour. > I would totally want to get this number to 60,000/hour. > Was hoping to complete the backfill within a day or two. But I think this is > going to take a week. > > > I looked at backend services: > They are mostly sitting idle for minutes (sometimes 5 minutes) > before they get a request. > > I am not sure if my configurations are right. > Has someone faced this before? Any suggestions for me? > > Currently, one of the bottlenecks I am observing is the time taken from > moving a task > from "Queued" -> "Used" stage (in the pool page of UI). > > > Thanks, > Harish > > > > >
