Umesh Padashetty created ATLAS-4072:
---------------------------------------
Summary: spark_column_lineage missing for insert into select *
queries run via spark-shell
Key: ATLAS-4072
URL: https://issues.apache.org/jira/browse/ATLAS-4072
Project: Atlas
Issue Type: Bug
Components: atlas-core
Reporter: Umesh Padashetty
>From the spark-shell, ran the below queries
* spark.sql("create table umesh(name string)");
* spark.sql("create table umesh_insert(name string)");
* spark.sql("insert into umesh_insert select * from umesh");
There is a spark_process created between umesh and umesh_insert tables, but the
spark_column_lineage is missing between the umesh.name and umesh_insert.name
columns
!Screenshot 2020-12-11 at 1.29.39 AM.png|width=438,height=435!
To cross verify the behavior, I ran similar hive queries via beeline and found
out that along with hive_process being created between umesh_hive and
umesh_hive_insert tables, hive_column_lineage is created between
umesh_hive.name and umesh_hive_insert.name columns.
Queries run via beeline
* create table umesh_hive(name string);
* create table umesh_hive_insert(name string);
* insert into umesh_hive_insert select * from umesh_hive;
!Screenshot 2020-12-11 at 1.34.24 AM.png|width=441,height=548!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)