[incubator-seatunnel] branch dev updated: [Doc][Connector-V2] Improve hive sink doc (#2875)

tyrantlucifer Sat, 24 Sep 2022 01:31:41 -0700

This is an automated email from the ASF dual-hosted git repository.

tyrantlucifer pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/incubator-seatunnel.git



The following commit(s) were added to refs/heads/dev by this push:
     new da14c2b08 [Doc][Connector-V2] Improve hive sink doc (#2875)
da14c2b08 is described below

commit da14c2b0868c389e158a0ceb59a7d48223b4021b
Author: Eric <[email protected]>
AuthorDate: Sat Sep 24 16:31:31 2022 +0800

    [Doc][Connector-V2] Improve hive sink doc (#2875)
    
    * Improve hive sink doc
    
    * Improve hive source doc
---
 docs/en/connector-v2/sink/Hive.md   | 100 +++++++++++++++++++++++++++++++++---
 docs/en/connector-v2/source/Hive.md |   2 +
 2 files changed, 94 insertions(+), 8 deletions(-)

diff --git a/docs/en/connector-v2/sink/Hive.md 
b/docs/en/connector-v2/sink/Hive.md
index 59d8dc5c3..e7e0a8f78 100644
--- a/docs/en/connector-v2/sink/Hive.md
+++ b/docs/en/connector-v2/sink/Hive.md
@@ -8,6 +8,8 @@ Write data to Hive.
 
 In order to use this connector, You must ensure your spark/flink cluster 
already integrated hive. The tested hive version is 2.3.9.
 
+**Tips: Hive Sink Connector not support array, map and struct datatype now**
+
 ## Key features
 
 - [x] [exactly-once](../../concept/connector-v2-features.md)
@@ -22,14 +24,14 @@ By default, we use 2PC commit to ensure `exactly-once`
 
 ## Options
 
-| name                  | type   | required | default value                    
                             |
-|-----------------------| ------ | -------- | 
------------------------------------------------------------- |
-| table_name            | string | yes      | -                                
                             |
-| metastore_uri         | string | yes      | -                                
                             |
-| partition_by          | array  | no       | -                                
                             |
-| sink_columns          | array  | no       | When this parameter is empty, 
all fields are sink columns     |
-| is_enable_transaction | boolean| no       | true                             
                             |
-| save_mode             | string | no       | "append"                         
                             |
+| name                  | type   | required                                    
| default value                                                 |
+|-----------------------| ------ 
|---------------------------------------------| 
------------------------------------------------------------- |
+| table_name            | string | yes                                         
| -                                                             |
+| metastore_uri         | string | yes                                         
| -                                                             |
+| partition_by          | array  | required if hive sink table have partitions 
| -                                                             |
+| sink_columns          | array  | no                                          
| When this parameter is empty, all fields are sink columns     |
+| is_enable_transaction | boolean| no                                          
| true                                                          |
+| save_mode             | string | no                                          
| "append"                                                      |
 
 ### table_name [string]
 
@@ -70,3 +72,85 @@ Streaming Job not support `overwrite`.
   }
 
 ```
+
+### example 1
+
+We have a source table like this:
+
+```bash
+create table test_hive_source(
+     test_tinyint                          TINYINT,
+     test_smallint                       SMALLINT,
+     test_int                                INT,
+     test_bigint                           BIGINT,
+     test_boolean                       BOOLEAN,
+     test_float                             FLOAT,
+     test_double                         DOUBLE,
+     test_string                           STRING,
+     test_binary                          BINARY,
+     test_timestamp                  TIMESTAMP,
+     test_decimal                       DECIMAL(8,2),
+     test_char                             CHAR(64),
+     test_varchar                        VARCHAR(64),
+     test_date                             DATE,
+     test_array                            ARRAY<INT>,
+     test_map                              MAP<STRING, FLOAT>,
+     test_struct                           STRUCT<street:STRING, city:STRING, 
state:STRING, zip:INT>
+     )
+PARTITIONED BY (test_par1 STRING, test_par2 STRING);
+
+```
+
+We need read data from the source table and write to another table:
+
+```bash
+create table test_hive_sink_text_simple(
+     test_tinyint                          TINYINT,
+     test_smallint                       SMALLINT,
+     test_int                                INT,
+     test_bigint                           BIGINT,
+     test_boolean                       BOOLEAN,
+     test_float                             FLOAT,
+     test_double                         DOUBLE,
+     test_string                           STRING,
+     test_binary                          BINARY,
+     test_timestamp                  TIMESTAMP,
+     test_decimal                       DECIMAL(8,2),
+     test_char                             CHAR(64),
+     test_varchar                        VARCHAR(64),
+     test_date                             DATE
+     )
+PARTITIONED BY (test_par1 STRING, test_par2 STRING);
+
+```
+
+The job config file can like this:
+
+```
+env {
+  # You can set flink configuration here
+  execution.parallelism = 3
+  job.name="test_hive_source_to_hive"
+}
+
+source {
+  Hive {
+    table_name = "test_hive.test_hive_source"
+    metastore_uri = "thrift://ctyun7:9083"
+  }
+}
+
+transform {
+}
+
+sink {
+  # choose stdout output plugin to output data to console
+
+  Hive {
+    table_name = "test_hive.test_hive_sink_text_simple"
+    metastore_uri = "thrift://ctyun7:9083"
+    partition_by = ["test_par1", "test_par2"]
+    sink_columns = ["test_tinyint", "test_smallint", "test_int", 
"test_bigint", "test_boolean", "test_float", "test_double", "test_string", 
"test_binary", "test_timestamp", "test_decimal", "test_char", "test_varchar", 
"test_date", "test_par1", "test_par2"]
+  }
+}
+```
diff --git a/docs/en/connector-v2/source/Hive.md 
b/docs/en/connector-v2/source/Hive.md
index 86ebfb2eb..99372fbcb 100644
--- a/docs/en/connector-v2/source/Hive.md
+++ b/docs/en/connector-v2/source/Hive.md
@@ -8,6 +8,8 @@ Read data from Hive.
 
 In order to use this connector, You must ensure your spark/flink cluster 
already integrated hive. The tested hive version is 2.3.9.
 
+**Tips: Hive Sink Connector can not add partition field to the output data 
now**
+
 ## Key features
 
 - [x] [batch](../../concept/connector-v2-features.md)

[incubator-seatunnel] branch dev updated: [Doc][Connector-V2] Improve hive sink doc (#2875)

Reply via email to