vladhlinsky opened a new pull request #93: ATLAS-3661 Create 'spark_column_lineage' type and relationship definition URL: https://github.com/apache/atlas/pull/93 ## What changes were proposed in this pull request? Create `spark_column_lineage` type and relationship definition to add support of column level lineage for `CREATE TABLE AS SELECT ...` statements and views. Column level lineage refers to lineage created between the input and output columns. For example: ``` hive > create table employee_ctas as select id from employee; ``` For the above query, lineage is created from `employee` to `employee_ctas`, and also from `employee.id` to `employee_ctas.id`. ## How was this patch tested? Manually using modified version of Spark Atlas Connector: - Installed and started Atlas. - `1100-spark_model.json` is updated with proposed changes. Atlas is restarted. - Executed the next statements using spark-shell: ``` spark.sql("create table sparkemployee_1_2(id int,name string)"); spark.sql("create table sparkemployee_ctas_1_2 as select id from sparkemployee_1_2"); ``` - Verified that each table has column entities and `spark_column_lineage` entity is created.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services