mik-laj commented on issue #7217: [AIRFLOW-NNNN] Store DAG's source code in the serialized_dag table URL: https://github.com/apache/airflow/pull/7217#issuecomment-576335959 # Schema database The considerations focus on the collection table and the further diagrams will only be the following tables: * dag * serialized_dag #Current schema <img width="803" alt="Screenshot 2020-01-20 at 16 20 41" src="https://user-images.githubusercontent.com/12058428/72740623-54b55000-3ba6-11ea-8da3-55630b0e036f.png"> ## Schema changes proposed by anitakar <img width="803" alt="Screenshot 2020-01-20 at 16 20 41" src="https://user-images.githubusercontent.com/12058428/72740538-26377500-3ba6-11ea-870b-3f951682677c.png"> Anita suggests adding a new `source_code` field in the `serialzed_dag` table # My proposition <img width="438" alt="Screenshot 2020-01-20 at 16 56 23" src="https://user-images.githubusercontent.com/12058428/72740548-2df71980-3ba6-11ea-91e4-7056df323fc1.png"> I think, we should add new `dag_file` table to avoid duplication of source code. New table have fileloc and fileloc_hash as primary key. The dag table contains only the fileloc field, but I think it would also be helpful to add fileloc_hash. I also used the blob type, because we don't need to process this code by text functions in the database. Migration script for PostgresSQL ```sql create table dag_file ( fileloc varchar(2000) not null, fileloc_hash integer not null, last_updated timestamp with time zone not null, source_code BYTEA NOT NULL, PRIMARY KEY (fileloc, fileloc_hash) ); alter table dag add fileloc_hash integer not null DEFAULT 0; alter table dag alter column fileloc_hash drop default; ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
