[ 
https://issues.apache.org/jira/browse/ATLAS-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carol Drummond updated ATLAS-2975:
----------------------------------
    Labels:   (was: release-notes)

> Hive hook generates duplicate column_lineage entities
> -----------------------------------------------------
>
>                 Key: ATLAS-2975
>                 URL: https://issues.apache.org/jira/browse/ATLAS-2975
>             Project: Atlas
>          Issue Type: Bug
>          Components: atlas-intg
>    Affects Versions: 1.0.0, 0.8.3, 1.1.0
>            Reporter: Madhan Neethiraj
>            Assignee: Madhan Neethiraj
>            Priority: Major
>             Fix For: 0.8.4, 1.2.0, 2.0.0
>
>         Attachments: ATLAS-2975-master.patch
>
>
> Hive hook is expected to create one column-lineage entity for each column in 
> the output table. However, for each output column, hive hook might generates 
> multiple column-lineage entities when multiple partitions are involved - one 
> entity for each partition. This can end up with large number of duplciate 
> column-lineage entities, depending on the number of partitions. Such 
> duplicate entities should be avoided.
> Here is the sample HSQL to repro this issue:
> {noformat}
> CREATE TABLE visitors(name STRING, dob DATE) PARTITIONED BY (yob INT);
> CREATE TABLE visitors_log(name STRING, dob DATE);
> INSERT INTO TABLE visitors_log VALUES('John',  '1980-08-08'),
>                                      ('Jack',  '1980-09-09'),
>                                      ('Kevin', '1990-10-10'),
>                                      ('Ken',   '1990-11-11'),
>                                      ('Larry', '1995-12-12');
> SET hive.exec.dynamic.partition.mode=nonstrict;
> INSERT INTO TABLE visitors PARTITION(yob) SELECT name, dob, YEAR(dob) yob 
> FROM visitors_log;
> {noformat}
> In above case, columns visitors.name and visitors.dob will have 3 input 
> lineage - one for each partition 1980, 1990 and 1995.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to