Thanks Brian. Your explanation does make sense and fits the symptom. What did you do to fix the issue?
On Sat, Aug 15, 2020 at 8:23 PM Brian Greene < br...@heisenbergwoodworking.com> wrote: > When i had a similar issue it turned out that the way the task(s) were > written, they'd RAPIDLY open a large number of new RDS connections. > > AWS RDS - particularly if you're using the cluster endpoint, is > performing a 'dns' lookup (4 hops if i recall correctly) before your > connection request actually resolves to a real host. This lookup is > throttled, and after a certain number of hits in a short time, it will > return the error above (which is annoying, as it makes it look like the DB > just 'vanishes' from time time). > > Brian > > On Sat, Aug 15, 2020 at 7:04 PM Ricky Shi <xiao.x....@gmail.com> wrote: > > > Hi Everyone, > > > > we encountered a very strange issue with airflow using AWS RDS as > backend. > > We found that when the number of tasks is big enough (>60), airflow will > > fail with the error message (MySQL RDS backend) > > > > sqlalchemy.exc.OperationalError: > > (MySQLdb._exceptions.OperationalError) (2005, "Unknown MySQL server > > host ... $AWS RDS address) > > > > or (Postgres RDS backend): > > > > psycopg2.OperationalError: could not translate host name $AWS RDS address > > > > > > When we restart airflow, it becomes fine; and the job scheduler & website > > are both running fine. However, it will fail again after a couple of days > > of smooth running, with the same error message. > > > > We found that on stack overflow, there are other ppl experiencing the > same > > issue but no solution found. Anyone knows how to resolve the issue? > > > > Thanks, > > > > -- > > Ricky Shi > > > -- Ricky Shi