[ 
https://issues.apache.org/jira/browse/AIRFLOW-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

MeiK updated AIRFLOW-5096:
--------------------------
    Priority: Major  (was: Minor)

> reduce the number of times the pickle is inserted into the database by 
> modifying the hash field of Dag
> ------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5096
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5096
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: DAG
>    Affects Versions: 1.10.3
>            Reporter: MeiK
>            Assignee: MeiK
>            Priority: Major
>
> After the scheduler has the --do_pickle option turned on, the scheduler will 
> insert all the file pickles into the database each time it scans the file, 
> which will cause the database to swell rapidly.
> In my opinion, the main reason is because the hash function that determines 
> whether the dag is the same as the pickle version uses the last_loaded field, 
> which changes every time it is read instead of modified. Therefore, airflow 
> inserts a large amount of unchanging data into the database.
> I created a commit in which the last modified time of the file was used 
> instead of last_loaded as the hash field, which works fine on my computer. 
> Please let me know if you have a better way.
> English is not my native language; please excuse typing errors.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to