(incubator-streampark-website) branch dev updated: [Improve] Apache projects add the prefix "Apache" (#361)

cancai Tue, 30 Apr 2024 07:48:59 -0700

This is an automated email from the ASF dual-hosted git repository.

cancai pushed a commit to branch dev
in repository 
https://gitbox.apache.org/repos/asf/incubator-streampark-website.git



The following commit(s) were added to refs/heads/dev by this push:
     new eb7e528  [Improve] Apache projects add the prefix "Apache" (#361)
eb7e528 is described below

commit eb7e5286b18734d8aab5889055bbffbf7a40bd99
Author: Cancai Cai <[email protected]>
AuthorDate: Tue Apr 30 22:47:59 2024 +0800

    [Improve] Apache projects add the prefix "Apache" (#361)
    
    * [Improve] Apache projects add the prefix "Apache"
    
    * Update 3-hadoop-resource-integration.md
    
    * Update 3-hadoop-resource-integration.md
    
    ---------
    
    Co-authored-by: tison <[email protected]>
---
 docs/flink-k8s/3-hadoop-resource-integration.md    | 70 ++++++++++------------
 .../flink-k8s/3-hadoop-resource-integration.md     | 54 ++++++++---------
 2 files changed, 58 insertions(+), 66 deletions(-)

diff --git a/docs/flink-k8s/3-hadoop-resource-integration.md 
b/docs/flink-k8s/3-hadoop-resource-integration.md
index 116e7b7..049a415 100644
--- a/docs/flink-k8s/3-hadoop-resource-integration.md
+++ b/docs/flink-k8s/3-hadoop-resource-integration.md
@@ -4,17 +4,17 @@ title: 'Hadoop Resource Integration'
 sidebar_position: 3
 ---
 
-## Using Hadoop resource in Flink on K8s
+## Using Apache Hadoop resource in Flink on Kubernetes
 
-Using Hadoop resources under the StreamPark Flink-K8s runtime, such as 
checkpoint mount HDFS, read and write Hive, etc. The general process is as 
follows:
+Using Hadoop resources under StreamPark's Flink Kubernetes runtime, such as 
checkpoint mount HDFS, read and write Hive, etc. The general process is as 
follows:
 
-#### 1、HDFS
+### 1. Apache HDFS
 
-       To put flink on k8s related resources in HDFS, you need to go through 
the following two steps:
+To put flink on k8s-related resources in HDFS, you need to go through the 
following two steps:
 
-##### i、add `shade jar`
+#### 1.1 Add the shaded jar
 
-            By default, the flink image pulled from Docker does not include 
hadoop-related jars. Here, flink:1.14.5-scala_2.12-java8 is taken as an 
example, as follows:
+By default, the Flink image pulled from Docker does not include Hadoop-related 
jars. Here, flink:1.14.5-scala_2.12-java8 is taken as an example, as follows:
 
 ```shell
 [flink@ff]  /opt/flink-1.14.5/lib
@@ -24,9 +24,9 @@ flink-dist_2.12-1.14.5.jar  flink-table_2.12-1.14.5.jar       
 log4j-core-2.17.1
 flink-json-1.14.5.jar       log4j-1.2-api-2.17.1.jar           
log4j-slf4j-impl-2.17.1.jar
 ```
 
-         This is to download the shaded jar and put it in the lib directory 
of flink. Take hadoop2 as an example, download 
`flink-shaded-hadoop-2-uber`：https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.7.5-9.0/flink-shaded-hadoop-2-uber-2.7.5-9.0.jar
+This is to download the shaded jar and put it in the lib directory of Flink. 
Take hadoop2 as an example; download 
`flink-shaded-hadoop-2-uber`：https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.7.5-9.0/flink-shaded-hadoop-2-uber-2.7.5-9.0.jar
 
-      In addition, you can configure the shade jar in a dependent manner in 
the `Dependency` in the StreamPark task configuration. the following 
configuration:
+In addition, you can configure the shaded jar in a dependent manner in the 
`Dependency` in the StreamPark task configuration. the following configuration:
 
 ```xml
 <dependency>
@@ -37,13 +37,13 @@ flink-json-1.14.5.jar       log4j-1.2-api-2.17.1.jar        
   log4j-slf4j-impl-
 </dependency>
 ```
 
-##### ii、add `core-site.xml` and `hdfs-site.xml`
+##### 1.2. add `core-site.xml` and `hdfs-site.xml`
 
-            With the shade jar, you also need the corresponding configuration 
file to find the hadoop address. Two configuration files are mainly involved 
here: `core-site.xml` and `hdfs-site.xml`, through the source code analysis of 
flink (the classes involved are mainly: org 
.apache.flink.kubernetes.kubeclient.parameters.AbstractKubernetesParameters), 
the two files have a fixed loading order, as follows:
+With the shaded jar, you also need the corresponding configuration file to 
find the Hadoop address. Two configuration files are mainly involved here: 
`core-site.xml` and `hdfs-site.xml`, through the source code analysis of flink 
(the classes involved are mainly: 
`org.apache.flink.kubernetes.kubeclient.parameters.AbstractKubernetesParameters`),
 the two files have a fixed loading order, as follows:
 
 ```java
 // The process of finding hadoop configuration files:
-// 1、Find out whether parameters have been 
added:${kubernetes.hadoop.conf.config-map.name}
+// 1. Find out whether parameters have been 
added:${kubernetes.hadoop.conf.config-map.name}
 @Override
 public Optional<String> getExistingHadoopConfigurationConfigMap() {
     final String existingHadoopConfigMap =
@@ -106,7 +106,7 @@ private List<File> getHadoopConfigurationFileItems(String 
localHadoopConfigurati
         return Collections.emptyList();
     }
 }
-// If the above files are found, it means that there is a hadoop environment. 
The above two files will be parsed into kv pairs, and then constructed into a 
ConfigMap. The naming rules are as follows:
+// If the above files are found, a Hadoop environment exists. The above two 
files will be parsed into key-value pairs and then constructed into a 
ConfigMap. The naming rules are as follows:
 public static String getHadoopConfConfigMapName(String clusterId) {
     return Constants.HADOOP_CONF_CONFIG_MAP_PREFIX + clusterId;
 }
@@ -114,38 +114,34 @@ public static String getHadoopConfConfigMapName(String 
clusterId) {
 
 
 
-#### 2、Hive
+### 2. Apache Hive
 
-        To sink data to Apache Hive, or use hive metastore as flink's 
metadata, it is necessary to open the path from Apache Flink to Apache Hive, 
which also needs to go through the following two steps:
+To sink data to Apache Hive or use Hive Metastore for Flink's metadata, it is 
necessary to open the path from Apache Flink to Apache Hive, which also needs 
to go through the following two steps:
 
-##### i、Add Apache Hive related jars
+#### 2.1. Add Hive-related jars
 
-           As mentioned above, the default flink image does not include 
hive-related jars. The following three hive-related jars need to be placed in 
the lib directory of flink. Here, Apache Hive version 2.3.6 is used as an 
example:
+As mentioned above, the default flink image does not include hive-related 
jars. The following three hive-related jars need to be placed in the lib 
directory of Flink. Here, Apache Hive version 2.3.6 is used as an example:
 
-                
a、`hive-exec`：https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.3.6/hive-exec-2.3.6.jar
+1. 
`hive-exec`：https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.3.6/hive-exec-2.3.6.jar
+2. 
`flink-connector-hive`：https://repo1.maven.org/maven2/org/apache/flink/flink-connector-hive_2.12/1.14.5/flink-connector-hive_2.12-1.14.5.jar
+3. 
`flink-sql-connector-hive`：https://repo1.maven.org/maven2/org/apache/flink/flink-sql-connector-hive-2.3.6_2.12/1.14.5/flink-sql-connector-hive-2.3.6_2.12-1.14.5.jar
 
-                
b、`flink-connector-hive`：https://repo1.maven.org/maven2/org/apache/flink/flink-connector-hive_2.12/1.14.5/flink-connector-hive_2.12-1.14.5.jar
+Similarly, the above-mentioned hive-related jars can also be dependently 
configured in the `Dependency` task configuration of StreamPark in a dependent 
manner, which will not be repeated here.
 
-                
c、`flink-sql-connector-hive`：https://repo1.maven.org/maven2/org/apache/flink/flink-sql-connector-hive-2.3.6_2.12/1.14.5/flink-sql-connector-hive-2.3.6_2.12-1.14.5.jar
+#### 2.2 Add Apache Hive configuration file (hive-site.xml)
 
-            Similarly, the above-mentioned hive-related jars can also be 
dependently configured in the `Dependency` in the task configuration of 
StreamPark in a dependent manner, which will not be repeated here.
+The difference to HDFS is that there is no default loading method for the hive 
configuration file in the Flink source code, so developers need to manually add 
the hive configuration file. There are three main methods here:
 
-##### ii、Add Apache Hive configuration file (hive-site.xml)
-
-             The difference from hdfs is that there is no default loading 
method for the hive configuration file in the flink source code, so developers 
need to manually add the hive configuration file. There are three main methods 
here:
-
-                  a. Put hive-site.xml in the custom image of flink, it is 
generally recommended to put it under the `/opt/flink/` directory in the image
-
-                  b. Put hive-site.xml behind the remote storage system, such 
as HDFS, and load it when it is used
-
-                  c. Mount hive-site.xml in k8s in the form of ConfigMap. It 
is recommended to use this method, as follows:
+1. Put hive-site.xml in the custom image of Flink, it is generally recommended 
to put it under the `/opt/flink/` directory in the image
+2. Put hive-site.xml behind the remote storage system, such as HDFS, and load 
it when it is used
+3. Mount hive-site.xml in k8s in the form of ConfigMap. It is recommended to 
use this method as follows:
 
 ```shell
-# 1、Mount the hive-site.xml at the specified location in the specified 
namespace
+# 1. Mount the hive-site.xml at the specified location in the specified 
namespace
 kubectl create cm hive-conf --from-file=hive-site.xml -n flink-test
-# 2、View the hive-site.xml mounted to k8s
+# 2. View the hive-site.xml mounted to k8s
 kubectl describe cm hive-conf -n flink-test 
-# 3、Mount this cm to the specified directory inside the container
+# 3. Mount this cm to the specified directory inside the container
 spec:
   containers:
     - name: flink-main-container
@@ -165,11 +161,7 @@ spec:
 
 #### Conclusion
 
-        Through the above method, Apache Flink can be connected with Apache 
Hadoop and Hive. This method can be extended to general, that is, flink and 
external systems such as redis, mongo, etc., generally require the following 
two steps:
-
-        i. Load the connector jar of the specified external service
-
-        ii. If there is, load the specified configuration file into the flink 
system
-
-
+Through the above method, Apache Flink can be connected with Apache Hadoop and 
Hive. This method can be extended to general, that is, flink and external 
systems such as redis, mongo, etc., generally require the following two steps:
 
+1. Load the connector jar of the specified external service;
+2. If there is, load the specified configuration file into the Flink system.
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/flink-k8s/3-hadoop-resource-integration.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/flink-k8s/3-hadoop-resource-integration.md
index 2380be8..5050d0c 100755
--- 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/flink-k8s/3-hadoop-resource-integration.md
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/flink-k8s/3-hadoop-resource-integration.md
@@ -4,17 +4,17 @@ title: 'Hadoop 资源集成'
 sidebar_position: 3
 ---
 
-## 在 Flink on K8s 上使用 Hadoop 资源
+## 在 Flink on Kubernetes 上使用 Apache Hadoop 资源
 
-在 StreamPark Flink-K8s runtime 下使用 Hadoop 资源，如 checkpoint 挂载 HDFS、读写 Hive 
等，大概流程如下：
+在 StreamPark Flink-Kubernetes runtime 下使用 Hadoop 资源，如 checkpoint 挂载 HDFS、读写 
Hive 等，大概流程如下：
 
-#### 1、HDFS
+##### 1. Apache HDFS
 
        如需将 flink on k8s 相关资源放在 HDFS 中，需要经过以下两个步骤：
 
-##### i、添加 `shade jar`
+##### 1.1 添加 shade jar
 
-           默认情况下，从 Docker 上 pull 的 flink 镜像是不包括 hadoop 相关的 jar，这里以 
flink:1.14.5-scala_2.12-java8 为例，如下：
+           默认情况下，从 Docker 上 pull 的 Flink 镜像是不包括 Hadoop 相关的 jar，这里以 
flink:1.14.5-scala_2.12-java8 为例，如下：
 
 ```shell
 [flink@ff]  /opt/flink-1.14.5/lib
@@ -24,7 +24,7 @@ flink-dist_2.12-1.14.5.jar  flink-table_2.12-1.14.5.jar       
 log4j-core-2.17.1
 flink-json-1.14.5.jar       log4j-1.2-api-2.17.1.jar           
log4j-slf4j-impl-2.17.1.jar
 ```
 
-         这是需要将 shade jar 下载下来，然后放在 flink 的 lib 目录下，这里 以hadoop2 为例，下载 
`flink-shaded-hadoop-2-uber`：https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.7.5-9.0/flink-shaded-hadoop-2-uber-2.7.5-9.0.jar
+         这是需要将 shade jar 下载下来，然后放在 Flink 的 lib 目录下，这里 以hadoop2 为例；下载 
`flink-shaded-hadoop-2-uber`：https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.7.5-9.0/flink-shaded-hadoop-2-uber-2.7.5-9.0.jar
 
       另外，可以将 shade jar 以依赖的方式在 StreamPark 的任务配置中的`Dependency` 进行依赖配置，如下配置：
 
@@ -37,13 +37,13 @@ flink-json-1.14.5.jar       log4j-1.2-api-2.17.1.jar        
   log4j-slf4j-impl-
 </dependency>
 ```
 
-##### ii、添加 core-site.xml 和 hdfs-site.xml
+##### 1.2、添加 core-site.xml 和 hdfs-site.xml
 
-            有了 shade jar 还需要相应的配置文件去找到 hadoop 
地址，这里主要涉及到两个配置文件：core-site.xml和hdfs-site.xml，通过 flink 
的源码分析(涉及到的类主要是：org.apache.flink.kubernetes.kubeclient.parameters.AbstractKubernetesParameters)，该两文件有固定的加载顺序，如下：
+            有了 shaded jar 还需要相应的配置文件去找到 Hadoop 
地址，这里主要涉及到两个配置文件：core-site.xml和hdfs-site.xml，通过 flink 
的源码分析(涉及到的类主要是：org.apache.flink.kubernetes.kubeclient.parameters.AbstractKubernetesParameters)，该两文件有固定的加载顺序，如下：
 
 ```java
 // 寻找 hadoop 配置文件的流程
-// 1、先去寻在是否添加了参数：kubernetes.hadoop.conf.config-map.name
+// 1. 先去寻在是否添加了参数：kubernetes.hadoop.conf.config-map.name
 @Override
 public Optional<String> getExistingHadoopConfigurationConfigMap() {
     final String existingHadoopConfigMap =
@@ -57,12 +57,12 @@ public Optional<String> 
getExistingHadoopConfigurationConfigMap() {
 
 @Override
 public Optional<String> getLocalHadoopConfigurationDirectory() {
-    // 2、如果没有1中指定的参数，查找提交 native 命令的本地环境是否有环境变量：HADOOP_CONF_DIR
+    // 2. 如果没有1中指定的参数，查找提交 native 命令的本地环境是否有环境变量：HADOOP_CONF_DIR
     final String hadoopConfDirEnv = 
System.getenv(Constants.ENV_HADOOP_CONF_DIR);
     if (StringUtils.isNotBlank(hadoopConfDirEnv)) {
         return Optional.of(hadoopConfDirEnv);
     }
-    // 3、如果没有2中环境变量，再继续看是否有环境变量：HADOOP_HOME
+    // 3. 如果没有2中环境变量，再继续看是否有环境变量：HADOOP_HOME
     final String hadoopHomeEnv = System.getenv(Constants.ENV_HADOOP_HOME);
     if (StringUtils.isNotBlank(hadoopHomeEnv)) {
         // Hadoop 2.x
@@ -106,7 +106,7 @@ private List<File> getHadoopConfigurationFileItems(String 
localHadoopConfigurati
         return Collections.emptyList();
     }
 }
-// 如果找到上述文件，说明有 hadoop 的环境，将会把上述两个文件解析为 kv 对，然后构建成一个 ConfigMap，名字命名规则如下：
+// 如果找到上述文件，说明有 Hadoop 的环境，将会把上述两个文件解析为 key-value 对，然后构建成一个 ConfigMap，名字命名规则如下：
 public static String getHadoopConfConfigMapName(String clusterId) {
     return Constants.HADOOP_CONF_CONFIG_MAP_PREFIX + clusterId;
 }
@@ -114,38 +114,38 @@ public static String getHadoopConfConfigMapName(String 
clusterId) {
 
 
 
-#### 2、Hive
+#### 2、Apache Hive
 
-        将数据 sink 到 hive，或者以 hive 的 metastore 作为 flink 的元数据，都需要打通 flink 到 hive 
的路径，同样需要经过一下两个步骤：
+        将数据 sink 到 Hive，或者以 Hive 的 Metastore 作为 Flink 的元数据，都需要打通 flink 到 hive 
的路径，同样需要经过一下两个步骤：
 
 ##### i、添加 hive 相关的 jar
 
            如上所述，默认 flink 镜像是不包括 hive 相关的 jar，需要将 hive 相关的如下三个 jar 放在 flink 的 
lib 目录下，这里以 hive 2.3.6 版本为例：
 
-                
a、`hive-exec`：https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.3.6/hive-exec-2.3.6.jar
+                1. 
`hive-exec`：https://repo1.maven.org/maven2/org/apache/hive/hive-exec/2.3.6/hive-exec-2.3.6.jar
 
-                
b、`flink-connector-hive`：https://repo1.maven.org/maven2/org/apache/flink/flink-connector-hive_2.12/1.14.5/flink-connector-hive_2.12-1.14.5.jar
+                2. 
`flink-connector-hive`：https://repo1.maven.org/maven2/org/apache/flink/flink-connector-hive_2.12/1.14.5/flink-connector-hive_2.12-1.14.5.jar
 
-                
c、`flink-sql-connector-hive`：https://repo1.maven.org/maven2/org/apache/flink/flink-sql-connector-hive-2.3.6_2.12/1.14.5/flink-sql-connector-hive-2.3.6_2.12-1.14.5.jar
+                3. 
`flink-sql-connector-hive`：https://repo1.maven.org/maven2/org/apache/flink/flink-sql-connector-hive-2.3.6_2.12/1.14.5/flink-sql-connector-hive-2.3.6_2.12-1.14.5.jar
 
                同样，也可以将上述 hive 相关 jar 以依赖的方式在 StreamPark 的任务配置中的`Dependency` 
进行依赖配置，这里不再赘述。
 
-##### ii、添加 hive 的配置文件（hive-site.xml）
+#### 2.1. 添加 hive 的配置文件（hive-site.xml）
 
              和 hdfs 所不同的是，flink 源码中并没有 hive 的配置文件的默认的加载方式，因此需要开发者手动添加 hive 
的配置文件，这里主要采用三种方式：
 
-              a、将 hive-site.xml 打在 flink 的自定义镜像之中，一般建议放在镜像里的`/opt/flink/`目录之下
+              1. 将 hive-site.xml 打在 flink 的自定义镜像之中，一般建议放在镜像里的`/opt/flink/`目录之下
 
-              b、将 hive-site.xml 放在远端的存储系统之后，例如 HDFS，在使用的时候进行加载
+              2. 将 hive-site.xml 放在远端的存储系统之后，例如 HDFS，在使用的时候进行加载
 
-              c、将 hive-site.xml 以 ConfigMap 的形式挂载在 k8s 之中，建议使用此种方式，如下：
+              3. 将 hive-site.xml 以 ConfigMap 的形式挂载在 k8s 之中，建议使用此种方式，如下：
 
 ```shell
-# 1、在指定的 ns 中挂载指定位置的 hive-site.xml
+# 1. 在指定的 ns 中挂载指定位置的 hive-site.xml
 kubectl create cm hive-conf --from-file=hive-site.xml -n flink-test
-# 2、查看挂载到 k8s 中的 hive-site.xml
+# 2. 查看挂载到 k8s 中的 hive-site.xml
 kubectl describe cm hive-conf -n flink-test 
-# 3、将此 cm 挂载到容器内指定的目录
+# 3. 将此 cm 挂载到容器内指定的目录
 spec:
   containers:
     - name: flink-main-container
@@ -165,11 +165,11 @@ spec:
 
 #### 总结
 
-        通过以上的方式便可以将 flink 和 hadoop、hive 打通，此方法可推广至一般，即 flink 
与外部系统如redis、mongo 等连通，一般需要如下两个步骤：
+        通过以上的方式便可以将 Flink 和 Hadoop、Hive 打通，此方法可推广至一般，即 flink 
与外部系统如redis、mongo 等连通，一般需要如下两个步骤：
 
-        i、加载指定外部服务的 connector jar
+        1. 加载指定外部服务的 connector jar
 
-      ii、如果有，加载指定的配置文件到 flink 系统之中
+      2. 如果有，加载指定的配置文件到 flink 系统之中

(incubator-streampark-website) branch dev updated: [Improve] Apache projects add the prefix "Apache" (#361)

Reply via email to