This is an automated email from the ASF dual-hosted git repository.

peacewong pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/linkis-website.git


The following commit(s) were added to refs/heads/dev by this push:
     new df956d12a86 Add hive-lineage.md (#754)
df956d12a86 is described below

commit df956d12a8609d6dae3ce8774e174bd6bdd2ec28
Author: ChengJie1053 <[email protected]>
AuthorDate: Tue Sep 26 20:12:21 2023 +0800

    Add hive-lineage.md (#754)
---
 docs/deployment/images/hive-lineage-log.png        | Bin 0 -> 72885 bytes
 docs/deployment/integrated/hive-lineage.md         | 140 +++++++++++++++++++++
 .../current/deployment/images/hive-lineage-log.png | Bin 0 -> 72885 bytes
 .../current/deployment/integrated/hive-lineage.md  | 140 +++++++++++++++++++++
 4 files changed, 280 insertions(+)

diff --git a/docs/deployment/images/hive-lineage-log.png 
b/docs/deployment/images/hive-lineage-log.png
new file mode 100644
index 00000000000..57caf1f9769
Binary files /dev/null and b/docs/deployment/images/hive-lineage-log.png differ
diff --git a/docs/deployment/integrated/hive-lineage.md 
b/docs/deployment/integrated/hive-lineage.md
new file mode 100644
index 00000000000..a6cfd274c55
--- /dev/null
+++ b/docs/deployment/integrated/hive-lineage.md
@@ -0,0 +1,140 @@
+---
+title: Installation Hive lineage
+sidebar_position: 1
+---
+
+This paper mainly introduces the 'Hive' engine blood collection scheme in 
'Linkis'.
+
+
+## 1. Introduction
+
+Hive provides a built-in lineage hook called LineageLogger, which is used to 
capture and record lineage information generated during query execution. By 
using the LineageLogger hook, you can capture and log the input and output 
tables, as well as column-level lineage relationships for queries.
+
+## 2. The Hive lineage collected into the log
+
+### 2.1 Modify `hive-site.xml`
+
+```shell
+vim $HIVE_HOME/conf/hive-site.xml
+
+Add the following configuration
+<property>
+    <name>hive.exec.post.hooks</name>
+    <value>org.apache.hadoop.hive.ql.hooks.LineageLogger</value>
+</property>
+```
+
+### 2.2 Modify `hive-log4j2.properties`
+
+```shell
+vim $HIVE_HOME/conf/hive-log4j2.properties
+
+Add the following configuration
+og4j.logger.org.apache.hadoop.hive.ql.hooks.LineageLogger=INFO
+```
+
+### 2.3 Submit task
+```shell
+sh ./bin/linkis-cli -engineType hive-3.1.3 \
+-codeType hql -code  \
+"CREATE TABLE input_table (
+  column1 INT,
+  column2 STRING
+);
+CREATE TABLE output_table (
+  column3 INT,
+  column4 STRING
+);
+INSERT INTO TABLE output_table
+SELECT column1, column2
+FROM input_table;"  \
+-submitUser hadoop -proxyUser hadoop
+```
+
+### 2.4 View logs
+```shell
+cat 
/appcom/tmp/hadoop/20230922/hive/946375fe-f189-487c-b3a7-f9fa821edace/logs/stdout
 
+```
+
+The output is as follows:
+![hive-lineage-log](../images/hive-lineage-log.png)
+
+Details are as follows:
+```json
+{
+  "version":"1.0",
+  "user":"hadoop",
+  "timestamp":1695354104,
+  "duration":15318,
+  "jobIds":[
+    "job_1691375506204_0488"
+  ],
+  "engine":"mr",
+  "database":"default",
+  "hash":"dbb11fce57f10dccb6ef724f66af611c",
+  "queryText":"INSERT INTO TABLE output_table\nSELECT column1, column2\nFROM 
input_table",
+  "edges":[
+    {
+      "sources":[
+        2
+      ],
+      "targets":[
+        0
+      ],
+      "edgeType":"PROJECTION"
+    },
+    {
+      "sources":[
+        3
+      ],
+      "targets":[
+        1
+      ],
+      "edgeType":"PROJECTION"
+    },
+    {
+      "sources":[
+        2
+      ],
+      "targets":[
+        0
+      ],
+      "expression":"compute_stats(default.input_table.column1, 'hll')",
+      "edgeType":"PROJECTION"
+    },
+    {
+      "sources":[
+        3
+      ],
+      "targets":[
+        1
+      ],
+      "expression":"compute_stats(default.input_table.column2, 'hll')",
+      "edgeType":"PROJECTION"
+    }
+  ],
+  "vertices":[
+    {
+      "id":0,
+      "vertexType":"COLUMN",
+      "vertexId":"default.output_table.column3"
+    },
+    {
+      "id":1,
+      "vertexType":"COLUMN",
+      "vertexId":"default.output_table.column4"
+    },
+    {
+      "id":2,
+      "vertexType":"COLUMN",
+      "vertexId":"default.input_table.column1"
+    },
+    {
+      "id":3,
+      "vertexType":"COLUMN",
+      "vertexId":"default.input_table.column2"
+    }
+  ]
+}
+```
+
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/images/hive-lineage-log.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/images/hive-lineage-log.png
new file mode 100644
index 00000000000..57caf1f9769
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/images/hive-lineage-log.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/integrated/hive-lineage.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/integrated/hive-lineage.md
new file mode 100644
index 00000000000..926c61256a6
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/deployment/integrated/hive-lineage.md
@@ -0,0 +1,140 @@
+---
+title: 集成 Hive血缘
+sidebar_position: 1
+---
+
+本文主要介绍在 `Linkis` 中, `Hive` 引擎血缘采集方案。
+
+
+## 1. 介绍
+
+Hive 提供了一个内置的Hook LineageLogger,它用于记录查询执行期间生成的血缘信息。通过使用 LineageLogger 
Hook,你可以捕获和记录查询的输入和输出表以及列级别的血缘关系
+
+## 2. 将hive血缘采集至日志
+
+### 2.1 修改`hive-site.xml`
+
+```shell
+vim $HIVE_HOME/conf/hive-site.xml
+
+增加如下配置
+<property>
+    <name>hive.exec.post.hooks</name>
+    <value>org.apache.hadoop.hive.ql.hooks.LineageLogger</value>
+</property>
+```
+
+### 2.2 修改`hive-log4j2.properties`
+
+```shell
+vim $HIVE_HOME/conf/hive-log4j2.properties
+
+增加如下配置
+og4j.logger.org.apache.hadoop.hive.ql.hooks.LineageLogger=INFO
+```
+
+### 2.3 提交任务
+```shell
+sh ./bin/linkis-cli -engineType hive-3.1.3 \
+-codeType hql -code  \
+"CREATE TABLE input_table (
+  column1 INT,
+  column2 STRING
+);
+CREATE TABLE output_table (
+  column3 INT,
+  column4 STRING
+);
+INSERT INTO TABLE output_table
+SELECT column1, column2
+FROM input_table;"  \
+-submitUser hadoop -proxyUser hadoop
+```
+
+### 2.4 查看日志
+```shell
+cat 
/appcom/tmp/hadoop/20230922/hive/946375fe-f189-487c-b3a7-f9fa821edace/logs/stdout
 
+```
+
+输出结果如下:
+![hive-lineage-log](../images/hive-lineage-log.png)
+
+详细信息如下:
+```json
+{
+  "version":"1.0",
+  "user":"hadoop",
+  "timestamp":1695354104,
+  "duration":15318,
+  "jobIds":[
+    "job_1691375506204_0488"
+  ],
+  "engine":"mr",
+  "database":"default",
+  "hash":"dbb11fce57f10dccb6ef724f66af611c",
+  "queryText":"INSERT INTO TABLE output_table\nSELECT column1, column2\nFROM 
input_table",
+  "edges":[
+    {
+      "sources":[
+        2
+      ],
+      "targets":[
+        0
+      ],
+      "edgeType":"PROJECTION"
+    },
+    {
+      "sources":[
+        3
+      ],
+      "targets":[
+        1
+      ],
+      "edgeType":"PROJECTION"
+    },
+    {
+      "sources":[
+        2
+      ],
+      "targets":[
+        0
+      ],
+      "expression":"compute_stats(default.input_table.column1, 'hll')",
+      "edgeType":"PROJECTION"
+    },
+    {
+      "sources":[
+        3
+      ],
+      "targets":[
+        1
+      ],
+      "expression":"compute_stats(default.input_table.column2, 'hll')",
+      "edgeType":"PROJECTION"
+    }
+  ],
+  "vertices":[
+    {
+      "id":0,
+      "vertexType":"COLUMN",
+      "vertexId":"default.output_table.column3"
+    },
+    {
+      "id":1,
+      "vertexType":"COLUMN",
+      "vertexId":"default.output_table.column4"
+    },
+    {
+      "id":2,
+      "vertexType":"COLUMN",
+      "vertexId":"default.input_table.column1"
+    },
+    {
+      "id":3,
+      "vertexType":"COLUMN",
+      "vertexId":"default.input_table.column2"
+    }
+  ]
+}
+```
+


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to