[
https://issues.apache.org/jira/browse/AIRFLOW-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
MeiK updated AIRFLOW-5096:
--------------------------
Priority: Major (was: Minor)
> reduce the number of times the pickle is inserted into the database by
> modifying the hash field of Dag
> ------------------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-5096
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5096
> Project: Apache Airflow
> Issue Type: Improvement
> Components: DAG
> Affects Versions: 1.10.3
> Reporter: MeiK
> Assignee: MeiK
> Priority: Major
>
> After the scheduler has the --do_pickle option turned on, the scheduler will
> insert all the file pickles into the database each time it scans the file,
> which will cause the database to swell rapidly.
> In my opinion, the main reason is because the hash function that determines
> whether the dag is the same as the pickle version uses the last_loaded field,
> which changes every time it is read instead of modified. Therefore, airflow
> inserts a large amount of unchanging data into the database.
> I created a commit in which the last modified time of the file was used
> instead of last_loaded as the hash field, which works fine on my computer.
> Please let me know if you have a better way.
> English is not my native language; please excuse typing errors.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)