mik-laj commented on issue #7217: [AIRFLOW-NNNN] Store DAG's source code in the 
serialized_dag table
URL: https://github.com/apache/airflow/pull/7217#issuecomment-576335959
 
 
   
   # Schema database
   The considerations focus on the collection table and the further diagrams 
will only be the following tables:
   * dag
   * serialized_dag
   
   #Current schema
   
   <img width="803" alt="Screenshot 2020-01-20 at 16 20 41" 
src="https://user-images.githubusercontent.com/12058428/72740623-54b55000-3ba6-11ea-8da3-55630b0e036f.png";>
   
   ## Schema changes proposed by anitakar
   
   <img width="803" alt="Screenshot 2020-01-20 at 16 20 41" 
src="https://user-images.githubusercontent.com/12058428/72740538-26377500-3ba6-11ea-870b-3f951682677c.png";>
   
   Anita suggests adding a new `source_code` field in the `serialzed_dag` table
   
   # My proposition
   
   <img width="438" alt="Screenshot 2020-01-20 at 16 56 23" 
src="https://user-images.githubusercontent.com/12058428/72740548-2df71980-3ba6-11ea-91e4-7056df323fc1.png";>
   
   I think, we should add new `dag_file` table to avoid duplication of source 
code. New table have fileloc and fileloc_hash as primary key. The dag table 
contains only the fileloc field, but I think it would also be helpful to add 
fileloc_hash.  I also used the blob type, because we don't need to process this 
code by text functions in the database. 
   
   Migration script for PostgresSQL
   ```sql
   create table dag_file
   (
           fileloc varchar(2000) not null,
        fileloc_hash integer not null,  
        last_updated timestamp with time zone not null,
        source_code BYTEA NOT NULL,
        PRIMARY KEY (fileloc, fileloc_hash)
   );
   
   alter table dag add fileloc_hash integer not null DEFAULT 0;
   alter table dag alter column fileloc_hash drop default;
   
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to