[
https://issues.apache.org/jira/browse/HIVE-28409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927788#comment-17927788
]
Raghav Aggarwal commented on HIVE-28409:
----------------------------------------
Hi [~kkasa], I have 2 questions, hoping you can help me here:
# Marking HIVE_LINEAGE_INFO as deprecated: I have a use case where I want to
write my custom hook which uses lineage information but lineage information
won’t be populated until I add the new class name
[here|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java#L78-L81].
The similar is mentioned in the description of HIVE-24051
{code:java}
postExecHooks.contains(“custom.class.name”) // custom hook class name {code}
# In
[SemanticAnalyzer|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L13270-L13272]
this HIVE_LINEAGE_INFO config based check is not present. I think it should be
same as
[Optimizer|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java#L78-L81]
, as it might not capture views related lineage information for custom hooks!
Let me know if changes are required, I can create a follow-up PR on this?
> Column lineage when creating view is missing if atlas HiveHook is set
> ---------------------------------------------------------------------
>
> Key: HIVE-28409
> URL: https://issues.apache.org/jira/browse/HIVE-28409
> Project: Hive
> Issue Type: Bug
> Components: lineage
> Reporter: Krisztian Kasa
> Assignee: Krisztian Kasa
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Column lineage info is collected by
> {{{}org.apache.hadoop.hive.ql.optimizer.lineage.Generator{}}}. This is called
> during Hive optimizations and view creation if one of these conditions is met:
> {code:java}
> hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_LINEAGE_INFO)
> ||
> postExecHooks.contains("org.apache.hadoop.hive.ql.hooks.PostExecutePrinter")
> ||
> postExecHooks.contains("org.apache.hadoop.hive.ql.hooks.LineageLogger")
> || postExecHooks.contains("org.apache.atlas.hive.hook.HiveHook")
> {code}
> [https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java#L78-L81]
> and
> [https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L13226-L13228]
> However HIVE-17125 introduced more conditions which affects only the
> {{org.apache.atlas.hive.hook.HiveHook}}
> [https://github.com/apache/hive/blob/09553fca66ff69ff870c8a181750b70d81a8640e/ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/Generator.java#L75-L86]
>
> Later HIVE-23244 changed the code handles view creation. Since there are no
> tests for testing view creation when {{org.apache.atlas.hive.hook.HiveHook}}
> is specified at all the new code skips column lineage info collection.
> The tests we have for testing column lineage info collection are using
> [LineageLogger.java|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/hooks/LineageLogger.java]
> which doesn't have any restriction in the Generator so column lineage info
> is always collected when LineageLogger is set.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)