We're sitting at over 2.4M task instances in our metadata db without much trouble. Have you seen substantial performance degradation or are you just worried about the future possibility?
On Wed, Apr 19, 2017 at 12:23 PM Maxime Beauchemin < [email protected]> wrote: > You can archive the `job` and `tasks_instance` table, the scheduler won't > try to backfill them as their respective DagRuns are not in a `running` > state. The scheduler only tries to schedule active DagRuns, and only > creates new [active] DagRuns forward from the latest one. > > Note that the criteria to archive `task_intance` should be based on > `start_date` and not `execution_date` as you don't want the archiving to > interfere with backfills or anything ongoing. > > Max > > On Wed, Apr 19, 2017 at 5:41 AM, Yongjun Park <[email protected]> > wrote: > > > Hi folks. > > > > I have a question about task instances. > > > > Is it possible to delete old task instances that have run successfully? > > Isn't it trying to backfill missing tasks? > > > > I have about 1,500 dags and am getting more dags. There're about 300 > > thousand of task instances currently. 10,000 tasks instances are made by > > every day. It'll use 3.6 million rows of mysql table in an year. > > > > I have concerns about a table which stores task instances that makes > large > > table which can cause performance degradation. > > > > How can I keep the table which stores task instances not to be bloated? > > > > > > Thanks, > > Yongjun > > >
