This is an automated email from the ASF dual-hosted git repository.

krisztiankasa pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/hive-site.git


The following commit(s) were added to refs/heads/main by this push:
     new bc55f52  HIVE-28768: Remove hardcoded post exec hooks (#43)
bc55f52 is described below

commit bc55f52c275064f63b396525421f6c5a35184520
Author: Raghav Aggarwal <[email protected]>
AuthorDate: Thu Mar 20 17:04:58 2025 +0530

    HIVE-28768: Remove hardcoded post exec hooks (#43)
---
 content/docs/latest/capture-lineage-info.md | 44 +++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/content/docs/latest/capture-lineage-info.md 
b/content/docs/latest/capture-lineage-info.md
new file mode 100644
index 0000000..60e015f
--- /dev/null
+++ b/content/docs/latest/capture-lineage-info.md
@@ -0,0 +1,44 @@
+# Capture Lineage Information In Hive Hooks
+
+## Background
+
+In Hive, lineage information is captured in the form of `LineageInfo` object. 
This object is created in the `SemanticAnalyzer` and is passed to the 
`HookContext` object. Users can use the following existing Hooks or implement 
their own custom hooks to capture this information and utilize it.
+
+##### Existing Hooks
+
+- org.apache.hadoop.hive.ql.hooks.PostExecutePrinter
+- org.apache.hadoop.hive.ql.hooks.LineageLogger
+- org.apache.atlas.hive.hook.HiveHook
+
+To facilitate the capture of lineage information in a custom hook or in a use 
case where the [existing hooks]({{< ref "#existing-hooks" >}}) are not set in 
`hive.exec.post.hooks`, a new configuration `hive.lineage.hook.info.enabled` 
was introduced in 
[HIVE-24051](https://issues.apache.org/jira/browse/HIVE-24051). This 
configuration is set to `false` by default.
+
+To provide filtering capability on query type in the lineage information, a 
new configuration `hive.lineage.hook.info.query.type` was introduced in 
[HIVE-28409](https://issues.apache.org/jira/browse/HIVE-28409), with default 
value as "_ALL_". Users can tune the configuration accordingly to capture 
lineage information only for the required query types. In 
[HIVE-28409](https://issues.apache.org/jira/browse/HIVE-28409), the previously 
introduced configuration `hive.lineage.hook.info.enabled [...]
+
+**NOTE: HIVE-28409, will be available in Hive-4.1.0 release.**
+
+Usage example:
+
+````
+hive.lineage.hook.info.query.type=ALL                                -- will 
capture lineage info for all the queries.
+hive.lineage.hook.info.query.type=CREATE_VIEW,CREATE_TABLE_AS_SELECT -- will 
capture lineage info related to these 2 particulare query types only.
+hive.lineage.hook.info.query.type=NONE                               -- will 
not capture lineage info for any query.
+````
+
+Previously, to capture lineage information, users has 2 ways:
+1. Set any of the above mentioned [existing hooks]({{< ref "#existing-hooks" 
>}}) in `hive.exec.post.hooks` configuration.
+2. Set `hive.lineage.hook.info.enabled` as true in cluster and restart 
HiveServer2 service. (Valid since Hive-4.0.0 release).
+
+**NOTE**: Just by enabling `hive.lineage.hook.info.enabled`, lineage 
information for "Create View" query type won't be captured, user has to set the 
[existing hooks]({{< ref "#existing-hooks" >}}) in `hive.exec.post.hooks` along 
with their custom hook class name.
+
+## Changes done in 
[HIVE-28768](https://issues.apache.org/jira/browse/HIVE-28768)
+
+The hardcoded values of the [existing hooks]({{< ref "#existing-hooks" >}}) 
that capture lineage information in `SemanticAnalyzer` and `Optimizer` code has 
been removed and to determine, whether lineage information should be captured 
or not, the value of `hive.lineage.hook.info.query.type` configuration is 
checked. **The default value of `hive.lineage.hook.info.query.type` has been 
set to "_NONE_".**
+
+## Implications of 
[HIVE-28768](https://issues.apache.org/jira/browse/HIVE-28768) on users
+
+1. Users migrating directly from Hive-3.x to HIVE-4.1.0 **will observe 
breaking changes** in the way lineage information is captured. Setting 
`hive.exec.post.hooks` to any of the [existing hooks]({{< ref "#existing-hooks" 
>}}) will not capture lineage information anymore. Users will have to make use 
of `hive.lineage.hook.info.query.type` configuration to capture lineage 
information.
+2. Users migrating from Hive-4.0.x to Hive-4.1.0 who don't have 
`hive.lineage.hook.info.enabled` set to true, **will also observe breaking 
changes** in the way lineage information is captured.
+
+***
+**NOTE: Recommended way to capture lineage information is though 
`hive.lineage.hook.info.query.type` configuration as  
`hive.lineage.hook.info.enabled` is marked as deprecated and is subjected to be 
removed in future release**
+***

Reply via email to