[ 
https://issues.apache.org/jira/browse/HIVE-28409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928081#comment-17928081
 ] 

Krisztian Kasa commented on HIVE-28409:
---------------------------------------

[~Aggarwal_Raghav] 
We marked HIVE_LINEAGE_INFO as deprecated because we need finer-grained control 
over lineage generation. The new property that should control this is 
[HIVE_LINEAGE_STATEMENT_FILTER|https://github.com/apache/hive/blob/523f7b7f0ae2951eccdf1eb08a426d0984d36e41/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3883C5-L3888].
 HIVE_LINEAGE_INFO is set to false by default, but 
HIVE_LINEAGE_STATEMENT_FILTER enables lineage generation for all types of SQL 
statements.

Hardcoding any hook class name is considered bad practice, which we would like 
to avoid in the future. I believe we should replace the existing checks for 
hardcoded class names with checks for the values of 
HIVE_LINEAGE_STATEMENT_FILTER and HIVE_LINEAGE_INFO.

Please feel free to submit a PR.

> Column lineage when creating view is missing if atlas HiveHook is set
> ---------------------------------------------------------------------
>
>                 Key: HIVE-28409
>                 URL: https://issues.apache.org/jira/browse/HIVE-28409
>             Project: Hive
>          Issue Type: Bug
>          Components: lineage
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.1.0
>
>
> Column lineage info is collected by 
> {{{}org.apache.hadoop.hive.ql.optimizer.lineage.Generator{}}}. This is called 
> during Hive optimizations and view creation if one of these conditions is met:
> {code:java}
> hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_LINEAGE_INFO)
>         || 
> postExecHooks.contains("org.apache.hadoop.hive.ql.hooks.PostExecutePrinter")
>         || 
> postExecHooks.contains("org.apache.hadoop.hive.ql.hooks.LineageLogger")
>         || postExecHooks.contains("org.apache.atlas.hive.hook.HiveHook")
> {code}
> [https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java#L78-L81]
> and 
> [https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L13226-L13228]
> However HIVE-17125 introduced more conditions which affects only the 
> {{org.apache.atlas.hive.hook.HiveHook}}
> [https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/Generator.java#L75-L86]
>  
> Later HIVE-23244 changed the code handles view creation. Since there are no 
> tests for testing view creation when {{org.apache.atlas.hive.hook.HiveHook}} 
> is specified at all the new code skips column lineage info collection.
> The tests we have for testing column lineage info collection are using 
> [LineageLogger.java|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/LineageLogger.java]
>  which doesn't have any restriction in the Generator so column lineage info 
> is always collected when LineageLogger is set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to