Re: Possible bug: Airflow frequently fail with AWS RDS backend when #tasks increases

Ricky Shi Sat, 15 Aug 2020 17:51:57 -0700

Thanks Brian. Your explanation does make sense and fits the symptom. What
did you do to fix the issue?




On Sat, Aug 15, 2020 at 8:23 PM Brian Greene <
[email protected]> wrote:

> When i had a similar issue it turned out that the way the task(s) were
> written, they'd RAPIDLY open a large number of new RDS connections.
>
> AWS RDS - particularly if you're using the cluster endpoint, is
> performing a 'dns' lookup (4 hops if i recall correctly) before your
> connection request actually resolves to a real host.  This lookup is
> throttled, and after a certain number of hits in a short time, it will
> return the error above (which is annoying, as it makes it look like the DB
> just 'vanishes' from time time).
>
> Brian
>
> On Sat, Aug 15, 2020 at 7:04 PM Ricky Shi <[email protected]> wrote:
>
> > Hi Everyone,
> >
> > we encountered a very strange issue with airflow using AWS RDS as
> backend.
> > We found that when the number of tasks is big enough (>60), airflow will
> > fail with the error message (MySQL RDS backend)
> >
> > sqlalchemy.exc.OperationalError:
> > (MySQLdb._exceptions.OperationalError) (2005, "Unknown MySQL server
> > host ... $AWS RDS address)
> >
> > or (Postgres RDS backend):
> >
> > psycopg2.OperationalError: could not translate host name $AWS RDS address
> >
> >
> > When we restart airflow, it becomes fine; and the job scheduler & website
> > are both running fine. However, it will fail again after a couple of days
> > of smooth running, with the same error message.
> >
> > We found that on stack overflow, there are other ppl experiencing the
> same
> > issue but no solution found. Anyone knows how to resolve the issue?
> >
> > Thanks,
> >
> > --
> > Ricky Shi
> >
>


-- 
Ricky Shi

Re: Possible bug: Airflow frequently fail with AWS RDS backend when #tasks increases

Reply via email to