Hi folks, thank you for all thw raised questions. It enabled us to
investigate this issue. After a while, and testing the webserver locally,
we found that the problem was mostly because we have a dag with 54 tasks
inside it and that was the major problem when requesting a page. To parse
the task, it took 1.5 seconds. Adding this to some connection overhead, for
sure we would get a long waiting time.

Sorry for all this alarm. Our fault. But I am glad we learn that. 😁

On Qua, 9 de ago de 2017 16:43 Bolke de Bruin <[email protected]> wrote:

> Did you run a tcpdump? Did you test with another web server? Did you try
> putting something like Nginx in front that is just better a doing web
> serving (we do also to add SSL, and we don't see this issue). Is your
> client host name resolvable? (Logging usually reports host names if you
> cannot do a reverse dns lookup it will take time to timeout.
>
>
> Verstuurd vanaf mijn iPad
>
> > Op 9 aug. 2017 om 20:48 heeft Victor Duarte Diniz Monteiro <
> [email protected]> het volgende geschreven:
> >
> > Hi Max,
> >
> > we have 3693 task instances in the database.
> > And we have created an index over start_date for table task_instance as
> you
> > suggested, but it is still slow. We don't think the problem is in the
> > database, because when we run AdHoc Queries, they return fast.
> >
> > Em qua, 9 de ago de 2017 às 12:47, Maxime Beauchemin <
> > [email protected]> escreveu:
> >
> >> It seems like the default sort should be on start_date, desc, and yes
> there
> >> should be an index on that.
> >>
> >> Also 100 per page is probably enough.
> >>
> >> Can you try [something like] that in your environment and report of
> loading
> >> times?
> >>
> >> Also for context, how many task instance do you have total?
> >>
> >> Max
> >>
> >> On Tue, Aug 8, 2017 at 12:07 PM, Victor Monteiro <
> [email protected]>
> >> wrote:
> >>
> >>> [image: Imagem PNG]
> >>> Screen Shot 2017-08-08 at 2.42.55 PM.png
> >>> <
> https://drive.google.com/a/ubee.in/file/d/0B7u1tjyaPWJQeVVtbFIzcS11eWc/
> >>> view?usp=drivesdk>
> >>>
> >>> I am sending the image one more time, hosted in google drive.
> >>>
> https://drive.google.com/a/ubee.in/file/d/0B7u1tjyaPWJQeVVtbFIzcS11eWc/
> >>> view?usp=drivesdk
> >>>
> >>> Edgar, did you find any solution to speed up webserver?
> >>>
> >>> Em ter, 8 de ago de 2017 às 16:04, Edgar Rodriguez
> >>> <[email protected]> escreveu:
> >>>
> >>>> I've been profiling the web UI for the last few days and I think I've
> >>> been
> >>>> able to identify some of the issues. I've seen similar response times
> >>> from
> >>>> the webserver.
> >>>> A couple of things that I found specifically for the task instance
> view
> >>>> are:
> >>>> 1. Page sizes on views are usually too large, and all HTML rendering
> is
> >>>> done server side, flask_admin introduces some latency rendering the
> >>>> templates for 500 TIs at a time in the TaskInstanceModelView, see [
> >>>> AIRFLOW-1483 <https://issues.apache.org/jira/browse/AIRFLOW-1483>]
> >>>> 2. Using unindexed column as default for ordering (required for
> >> paging),
> >>>> triggering a sort on TI requests, e.g. TaskInstanceModelView uses
> >>> `job_id`
> >>>> as default sort column, but there's no index for that, see
> >> [AIRFLOW-1495
> >>>> <https://issues.apache.org/jira/browse/AIRFLOW-1495>]
> >>>>
> >>>> Cheers,
> >>>> Edgar
> >>>>
> >>>> On Tue, Aug 8, 2017 at 11:56 AM, Victor Monteiro <
> >>> [email protected]>
> >>>> wrote:
> >>>>
> >>>>> Sorry, I am sending again.
> >>>>>
> >>>>> Also, it is always between 6s and 3s.
> >>>>>
> >>>>>
> >>>>> Em ter, 8 de ago de 2017 às 15:21, Ash Berlin-Taylor <
> >>>>> [email protected]> escreveu:
> >>>>>
> >>>>>> (Your screenshot didn't come through for me, possibly because the
> >> list
> >>>>>> stripped it? That said:)
> >>>>>>
> >>>>>> Is it always 6 seconds, or after making a few requests, enough so
> >> that
> >>>>>> each worker stands a chance to have loaded the app any deps does it
> >>>> settle
> >>>>>> down?
> >>>>>>
> >>>>>> i.e. the problem might just be that of warm-up.
> >>>>>>
> >>>>>> -ash
> >>>>>>> On 8 Aug 2017, at 18:52, Victor Monteiro <[email protected]
> >>>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Hi everyone.
> >>>>>>>
> >>>>>>> The problem is very straightforward. When doing a request to
> >> airflow
> >>>>>> webserver, it is taking too much time to send the first byte.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> As you can see in the picture, it took 6 seconds to send the first
> >>>>>> byte. I already investigated the connection with the database and it
> >>>> took
> >>>>>> 36ms to list all task instances. So, I am starting to think there
> >> is a
> >>>>>> problem with  airflow webserver or my deployment.
> >>>>>>>
> >>>>>>> To give you more details about deployment and configurations:
> >>>>>>> web_server_worker_timeout = 120
> >>>>>>> workers = 4
> >>>>>>> sql_alchemy_pool_size = 5
> >>>>>>> sql_alchemy_pool_recycle = 3600
> >>>>>>> AWS RDS postgres
> >>>>>>> AWS m4.large
> >>>>>>> Does anyone know what can be causing this problem?
> >>>>>>>
> >>>>>>> Thank you :D
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>
> >>
>

Reply via email to