This is an automated email from the ASF dual-hosted git repository.
tyrantlucifer pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/incubator-seatunnel.git
The following commit(s) were added to refs/heads/dev by this push:
new da14c2b08 [Doc][Connector-V2] Improve hive sink doc (#2875)
da14c2b08 is described below
commit da14c2b0868c389e158a0ceb59a7d48223b4021b
Author: Eric <[email protected]>
AuthorDate: Sat Sep 24 16:31:31 2022 +0800
[Doc][Connector-V2] Improve hive sink doc (#2875)
* Improve hive sink doc
* Improve hive source doc
---
docs/en/connector-v2/sink/Hive.md | 100 +++++++++++++++++++++++++++++++++---
docs/en/connector-v2/source/Hive.md | 2 +
2 files changed, 94 insertions(+), 8 deletions(-)
diff --git a/docs/en/connector-v2/sink/Hive.md
b/docs/en/connector-v2/sink/Hive.md
index 59d8dc5c3..e7e0a8f78 100644
--- a/docs/en/connector-v2/sink/Hive.md
+++ b/docs/en/connector-v2/sink/Hive.md
@@ -8,6 +8,8 @@ Write data to Hive.
In order to use this connector, You must ensure your spark/flink cluster
already integrated hive. The tested hive version is 2.3.9.
+**Tips: Hive Sink Connector not support array, map and struct datatype now**
+
## Key features
- [x] [exactly-once](../../concept/connector-v2-features.md)
@@ -22,14 +24,14 @@ By default, we use 2PC commit to ensure `exactly-once`
## Options
-| name | type | required | default value
|
-|-----------------------| ------ | -------- |
------------------------------------------------------------- |
-| table_name | string | yes | -
|
-| metastore_uri | string | yes | -
|
-| partition_by | array | no | -
|
-| sink_columns | array | no | When this parameter is empty,
all fields are sink columns |
-| is_enable_transaction | boolean| no | true
|
-| save_mode | string | no | "append"
|
+| name | type | required
| default value |
+|-----------------------| ------
|---------------------------------------------|
------------------------------------------------------------- |
+| table_name | string | yes
| - |
+| metastore_uri | string | yes
| - |
+| partition_by | array | required if hive sink table have partitions
| - |
+| sink_columns | array | no
| When this parameter is empty, all fields are sink columns |
+| is_enable_transaction | boolean| no
| true |
+| save_mode | string | no
| "append" |
### table_name [string]
@@ -70,3 +72,85 @@ Streaming Job not support `overwrite`.
}
```
+
+### example 1
+
+We have a source table like this:
+
+```bash
+create table test_hive_source(
+ test_tinyint TINYINT,
+ test_smallint SMALLINT,
+ test_int INT,
+ test_bigint BIGINT,
+ test_boolean BOOLEAN,
+ test_float FLOAT,
+ test_double DOUBLE,
+ test_string STRING,
+ test_binary BINARY,
+ test_timestamp TIMESTAMP,
+ test_decimal DECIMAL(8,2),
+ test_char CHAR(64),
+ test_varchar VARCHAR(64),
+ test_date DATE,
+ test_array ARRAY<INT>,
+ test_map MAP<STRING, FLOAT>,
+ test_struct STRUCT<street:STRING, city:STRING,
state:STRING, zip:INT>
+ )
+PARTITIONED BY (test_par1 STRING, test_par2 STRING);
+
+```
+
+We need read data from the source table and write to another table:
+
+```bash
+create table test_hive_sink_text_simple(
+ test_tinyint TINYINT,
+ test_smallint SMALLINT,
+ test_int INT,
+ test_bigint BIGINT,
+ test_boolean BOOLEAN,
+ test_float FLOAT,
+ test_double DOUBLE,
+ test_string STRING,
+ test_binary BINARY,
+ test_timestamp TIMESTAMP,
+ test_decimal DECIMAL(8,2),
+ test_char CHAR(64),
+ test_varchar VARCHAR(64),
+ test_date DATE
+ )
+PARTITIONED BY (test_par1 STRING, test_par2 STRING);
+
+```
+
+The job config file can like this:
+
+```
+env {
+ # You can set flink configuration here
+ execution.parallelism = 3
+ job.name="test_hive_source_to_hive"
+}
+
+source {
+ Hive {
+ table_name = "test_hive.test_hive_source"
+ metastore_uri = "thrift://ctyun7:9083"
+ }
+}
+
+transform {
+}
+
+sink {
+ # choose stdout output plugin to output data to console
+
+ Hive {
+ table_name = "test_hive.test_hive_sink_text_simple"
+ metastore_uri = "thrift://ctyun7:9083"
+ partition_by = ["test_par1", "test_par2"]
+ sink_columns = ["test_tinyint", "test_smallint", "test_int",
"test_bigint", "test_boolean", "test_float", "test_double", "test_string",
"test_binary", "test_timestamp", "test_decimal", "test_char", "test_varchar",
"test_date", "test_par1", "test_par2"]
+ }
+}
+```
diff --git a/docs/en/connector-v2/source/Hive.md
b/docs/en/connector-v2/source/Hive.md
index 86ebfb2eb..99372fbcb 100644
--- a/docs/en/connector-v2/source/Hive.md
+++ b/docs/en/connector-v2/source/Hive.md
@@ -8,6 +8,8 @@ Read data from Hive.
In order to use this connector, You must ensure your spark/flink cluster
already integrated hive. The tested hive version is 2.3.9.
+**Tips: Hive Sink Connector can not add partition field to the output data
now**
+
## Key features
- [x] [batch](../../concept/connector-v2-features.md)