(doris) branch master updated: [fix](stream load)add test case and doc for arrow type of stream load (#28098)

morningman Thu, 21 Dec 2023 21:18:56 -0800

This is an automated email from the ASF dual-hosted git repository.

morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git



The following commit(s) were added to refs/heads/master by this push:
     new 7710c85904a [fix](stream load)add test case and doc for arrow type of 
stream load (#28098)
7710c85904a is described below

commit 7710c85904acbedd3e2a265da2e2108b63a3c5c0
Author: wuwenchi <[email protected]>
AuthorDate: Fri Dec 22 13:18:44 2023 +0800

    [fix](stream load)add test case and doc for arrow type of stream load 
(#28098)
    
    add test case and doc for arrow type of stream load
---
 .../import/import-way/stream-load-manual.md        |   6 +-
 docs/en/docs/ecosystem/spark-doris-connector.md    |   3 +-
 .../import/import-way/stream-load-manual.md        |   7 +-
 docs/zh-CN/docs/ecosystem/spark-doris-connector.md |   3 +-
 .../spark_connector/spark_connector_arrow.out      |  37 ++++++
 .../spark_connector/spark_connector_arrow.groovy   | 146 +++++++++++++++++++++
 6 files changed, 196 insertions(+), 6 deletions(-)

diff --git a/docs/en/docs/data-operate/import/import-way/stream-load-manual.md 
b/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
index 8df3753e828..bb2cdde4e3d 100644
--- a/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
+++ b/docs/en/docs/data-operate/import/import-way/stream-load-manual.md
@@ -166,9 +166,11 @@ Stream load uses HTTP protocol, so all parameters related 
to import tasks are se
   
 + format
 
-  Specify the import data format, support csv, json, the default is csv
+  Specify the import data format, support csv, json, arrow, the default is csv
 
-  <version since="1.2">supports `csv_with_names` (csv file line header 
filter), `csv_with_names_and_types` (csv file first two lines filter), parquet, 
orc</version>
+  <version since="1.2">supports `csv_with_names` (csv file line header 
filter), `csv_with_names_and_types` (csv file first two lines filter), parquet, 
orc.</version>
+
+  <version since="2.1.0">supports `arrow` format.</version>
 
 + exec\_mem\_limit
 
diff --git a/docs/en/docs/ecosystem/spark-doris-connector.md 
b/docs/en/docs/ecosystem/spark-doris-connector.md
index 8921c68d09f..80874436421 100644
--- a/docs/en/docs/ecosystem/spark-doris-connector.md
+++ b/docs/en/docs/ecosystem/spark-doris-connector.md
@@ -268,7 +268,8 @@ kafkaSource.selectExpr("CAST(key AS STRING)", "CAST(value 
as STRING)")
 | doris.write.fields               | --                | Specifies the fields 
(or the order of the fields) to write to the Doris table, fileds separated by 
commas.<br/>By default, all fields are written in the order of Doris table 
fields.                                                                         
                                                                                
                                                                                
                      [...]
 | doris.sink.batch.size            | 100000            | Maximum number of 
lines in a single write BE                                                      
                                                                                
                                                                                
                                                                                
                                                                                
                  [...]
 | doris.sink.max-retries           | 0                 | Number of retries 
after writing BE failed                                                         
                                                                                
                                                                                
                                                                                
                                                                                
                  [...]
-| sink.properties.*                | --                | Import parameters for 
Stream Load. <br/>For example:<br/>Specify column separator: 
`'doris.sink.properties.column_separator' = ','`, specify write data format: 
`'doris.sink.properties.format' = 'json'` [More parameter 
details](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/import-way/stream-load-manual)
                                                                                
                                   |
+| doris.sink.properties.format     | --                | Data format of the 
stream load.<br/>Supported formats: csv, json, arrow(since version 1.4.0)<br/> 
[More Multi-parameter 
details](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/import-way/stream-load-manual)
                                                                                
                                                                                
                                                         [...]
+| doris.sink.properties.*          | --                | Import parameters for 
Stream Load. <br/>For example:<br/>Specify column separator: 
`'doris.sink.properties.column_separator' = ','`.<br/>[More parameter 
details](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/import-way/stream-load-manual)
                                                                                
                                                                                
                        [...]
 | doris.sink.task.partition.size   | --                | The number of 
partitions corresponding to the Writing task. After filtering and other 
operations, the number of partitions written in Spark RDD may be large, but the 
number of records corresponding to each Partition is relatively small, 
resulting in increased writing frequency and waste of computing resources. The 
smaller this value is set, the less Doris write frequency and less Doris merge 
pressure. It is generally used with dori [...]
 | doris.sink.task.use.repartition  | false             | Whether to use 
repartition mode to control the number of partitions written by Doris. The 
default value is false, and coalesce is used (note: if there is no Spark action 
before the write, the whole computation will be less parallel). If it is set to 
true, then repartition is used (note: you can set the final number of 
partitions at the cost of shuffle).                                             
                                    [...]
 | doris.sink.batch.interval.ms     | 50                | The interval time of 
each batch sink, unit ms.                                                       
                                                                                
                                                                                
                                                                                
                                                                                
               [...]
diff --git 
a/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md 
b/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
index c96dfdbd2cc..0f4dc28c737 100644
--- a/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
+++ b/docs/zh-CN/docs/data-operate/import/import-way/stream-load-manual.md
@@ -154,8 +154,11 @@ Stream Load 由于使用的是 HTTP 协议，所以所有导入任务有关的
 
 - format
 
-  指定导入数据格式，支持 `csv` 和 `json`，默认是 `csv`
-  <version since="1.2"> 支持 `csv_with_names` 
(csv文件行首过滤)、`csv_with_names_and_types`(csv文件前两行过滤)、`parquet`、`orc`</version>
+  指定导入数据格式，支持 `csv`、 `json` 和 `arrow` ，默认是 `csv`。
+
+  <version since="1.2"> 支持 `csv_with_names` 
(csv文件行首过滤)、`csv_with_names_and_types`(csv文件前两行过滤)、`parquet`、`orc`。</version>
+
+  <version since="2.1.0"> 支持 `arrow`格式。</version>
 
   ```text
   列顺序变换例子：原始数据有三列(src_c1,src_c2,src_c3), 目前doris表也有三列（dst_c1,dst_c2,dst_c3）
diff --git a/docs/zh-CN/docs/ecosystem/spark-doris-connector.md 
b/docs/zh-CN/docs/ecosystem/spark-doris-connector.md
index 70a8fcac691..8c4c84ad663 100644
--- a/docs/zh-CN/docs/ecosystem/spark-doris-connector.md
+++ b/docs/zh-CN/docs/ecosystem/spark-doris-connector.md
@@ -272,7 +272,8 @@ kafkaSource.selectExpr("CAST(value as STRING)")
 | doris.write.fields               | --                | 
指定写入Doris表的字段或者字段顺序，多列之间使用逗号分隔。<br />默认写入时要按照Doris表字段顺序写入全部字段。                  
                                                                                
                                                                                
                        |
 | doris.sink.batch.size            | 100000            | 单次写BE的最大行数            
                                                                                
                                                                                
                                                                                
  |
 | doris.sink.max-retries           | 0                 | 写BE失败之后的重试次数          
                                                                                
                                                                                
                                                                                
  |
-| doris.sink.properties.*          | --                | Stream Load 
的导入参数。<br/>例如:<br/>指定列分隔符:`'doris.sink.properties.column_separator' = ','`、 
指定写入数据格式:`'doris.sink.properties.format' = 'json'` 
[更多参数详情](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/import-way/stream-load-manual)
 |
+| doris.sink.properties.format     | csv               | Stream Load 
的数据格式。<br/>共支持3种格式：csv，json，arrow（1.4.0版本开始支持）<br/> 
[更多参数详情](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/import-way/stream-load-manual)
 |
+| doris.sink.properties.*          | --                | Stream Load 
的导入参数。<br/>例如:<br/>指定列分隔符:`'doris.sink.properties.column_separator' = 
','`等<br/> 
[更多参数详情](https://doris.apache.org/zh-CN/docs/dev/data-operate/import/import-way/stream-load-manual)
 |
 | doris.sink.task.partition.size   | --                | Doris写入任务对应的 
Partition 个数。Spark RDD 经过过滤等操作，最后写入的 Partition 数可能会比较大，但每个 Partition 
对应的记录数比较少，导致写入频率增加和计算资源浪费。<br/>此数值设置越小，可以降低 Doris 写入频率，减少 Doris 合并压力。该参数配合 
doris.sink.task.use.repartition 使用。                                             
                           |
 | doris.sink.task.use.repartition  | false             | 是否采用 repartition 方式控制 
Doris写入 Partition数。默认值为 false，采用 coalesce 方式控制（注意: 如果在写入之前没有 Spark action 
算子，可能会导致整个计算并行度降低）。<br/>如果设置为 true，则采用 repartition 方式（注意: 可设置最后 Partition 
数，但会额外增加 shuffle 开销）。                                                           
              |
 | doris.sink.batch.interval.ms     | 50                | 每个批次sink的间隔时间，单位 ms。  
                                                                                
                                                                                
                                                                                
  |
diff --git 
a/regression-test/data/connector_p0/spark_connector/spark_connector_arrow.out 
b/regression-test/data/connector_p0/spark_connector/spark_connector_arrow.out
new file mode 100644
index 00000000000..ac2b00166fa
--- /dev/null
+++ 
b/regression-test/data/connector_p0/spark_connector/spark_connector_arrow.out
@@ -0,0 +1,37 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !q01 --
+1      true    1       2       3       4       123456789       6.6     7.7     
3.12000 2023-09-08      2023-09-08T17:12:34.123456      char    varchar string
+1      true    1       2       3       4       123456789       6.6     7.7     
3.12000 2023-09-08      2023-09-08T17:12:34.123456      char    varchar string
+1      true    1       2       3       4       123456789       6.6     7.7     
3.12000 2023-09-08      2023-09-08T17:12:34.123456      char    varchar string
+1      true    1       2       3       4       123456789       6.6     7.7     
3.12000 2023-09-08      2023-09-08T17:12:34.123456      char    varchar string
+1      true    1       2       3       4       123456789       6.6     7.7     
3.12000 2023-09-08      2023-09-08T17:12:34.123456      char    varchar string
+1      true    1       2       3       4       123456789       6.6     7.7     
3.12000 2023-09-08      2023-09-08T17:12:34.123456      char    varchar string
+1      true    1       2       3       4       123456789       6.6     7.7     
3.12000 2023-09-08      2023-09-08T17:12:34.123456      char    varchar string
+
+-- !q02 --
+1      [1, 0, 0, 1, 1] [1, 2, 3]       [2, 12, 32]     [3, 4, 5, 6]    [4, 5, 
6]       [123456789, 987654321, 123789456]       [6.6, 6.7, 7.8] [7.7, 8.8, 
8.8999996185302734]  [3.12000, 1.12345]      ["2023-09-08", "2027-10-28"]    
["2023-09-08 17:12:34.123456", "2024-09-08 18:12:34.123456"]    ["char", 
"char2"]       ["varchar", "varchar2"] ["string", "string2"]
+1      [1, 0, 0, 1, 1] [1, 2, 3]       [2, 12, 32]     [3, 4, 5, 6]    [4, 5, 
6]       [123456789, 987654321, 123789456]       [6.6, 6.7, 7.8] [7.7, 8.8, 
8.8999996185302734]  [3.12000, 1.12345]      ["2023-09-08", "2027-10-28"]    
["2023-09-08 17:12:34.123456", "2024-09-08 18:12:34.123456"]    ["char", 
"char2"]       ["varchar", "varchar2"] ["string", "string2"]
+1      [1, 0, 0, 1, 1] [1, 2, 3]       [2, 12, 32]     [3, 4, 5, 6]    [4, 5, 
6]       [123456789, 987654321, 123789456]       [6.6, 6.7, 7.8] [7.7, 8.8, 
8.8999996185302734]  [3.12000, 1.12345]      ["2023-09-08", "2027-10-28"]    
["2023-09-08 17:12:34.123456", "2024-09-08 18:12:34.123456"]    ["char", 
"char2"]       ["varchar", "varchar2"] ["string", "string2"]
+1      [1, 0, 0, 1, 1] [1, 2, 3]       [2, 12, 32]     [3, 4, 5, 6]    [4, 5, 
6]       [123456789, 987654321, 123789456]       [6.6, 6.7, 7.8] [7.7, 8.8, 
8.8999996185302734]  [3.12000, 1.12345]      ["2023-09-08", "2027-10-28"]    
["2023-09-08 17:12:34.123456", "2024-09-08 18:12:34.123456"]    ["char", 
"char2"]       ["varchar", "varchar2"] ["string", "string2"]
+1      [1, 0, 0, 1, 1] [1, 2, 3]       [2, 12, 32]     [3, 4, 5, 6]    [4, 5, 
6]       [123456789, 987654321, 123789456]       [6.6, 6.7, 7.8] [7.7, 8.8, 
8.8999996185302734]  [3.12000, 1.12345]      ["2023-09-08", "2027-10-28"]    
["2023-09-08 17:12:34.123456", "2024-09-08 18:12:34.123456"]    ["char", 
"char2"]       ["varchar", "varchar2"] ["string", "string2"]
+1      [1, 0, 0, 1, 1] [1, 2, 3]       [2, 12, 32]     [3, 4, 5, 6]    [4, 5, 
6]       [123456789, 987654321, 123789456]       [6.6, 6.7, 7.8] [7.7, 8.8, 
8.8999996185302734]  [3.12000, 1.12345]      ["2023-09-08", "2027-10-28"]    
["2023-09-08 17:12:34.123456", "2024-09-08 18:12:34.123456"]    ["char", 
"char2"]       ["varchar", "varchar2"] ["string", "string2"]
+1      [1, 0, 0, 1, 1] [1, 2, 3]       [2, 12, 32]     [3, 4, 5, 6]    [4, 5, 
6]       [123456789, 987654321, 123789456]       [6.6, 6.7, 7.8] [7.7, 8.8, 
8.8999996185302734]  [3.12000, 1.12345]      ["2023-09-08", "2027-10-28"]    
["2023-09-08 17:12:34.123456", "2024-09-08 18:12:34.123456"]    ["char", 
"char2"]       ["varchar", "varchar2"] ["string", "string2"]
+
+-- !q03 --
+1      {1:1, 0:1}      {1:2, 3:4}      {2:4, 5:6}      {3:4, 7:8}      {4:5, 
1:2}      {123456789:987654321, 789456123:456789123}      {6.6:8.8, 9.9:10.1}   
  {7.7:1.1, 2.2:3.3}      {3.12000:1.23000, 2.34000:5.67000}      
{"2023-09-08":"2024-09-08", "1023-09-08":"2023-09-08"}  {"":"2023-09-08 
17:12:34.123456", "3023-09-08 17:12:34.123456":"4023-09-08 17:12:34.123456"}    
{"char":"char2", "char2":"char3"}       {"varchar":"varchar2", 
"varchar3":"varchar4"}   {"string":"string2", "string3":"string4"}
+1      {1:1, 0:1}      {1:2, 3:4}      {2:4, 5:6}      {3:4, 7:8}      {4:5, 
1:2}      {123456789:987654321, 789456123:456789123}      {6.6:8.8, 9.9:10.1}   
  {7.7:1.1, 2.2:3.3}      {3.12000:1.23000, 2.34000:5.67000}      
{"2023-09-08":"2024-09-08", "1023-09-08":"2023-09-08"}  {"":"2023-09-08 
17:12:34.123456", "3023-09-08 17:12:34.123456":"4023-09-08 17:12:34.123456"}    
{"char":"char2", "char2":"char3"}       {"varchar":"varchar2", 
"varchar3":"varchar4"}   {"string":"string2", "string3":"string4"}
+1      {1:1, 0:1}      {1:2, 3:4}      {2:4, 5:6}      {3:4, 7:8}      {4:5, 
1:2}      {123456789:987654321, 789456123:456789123}      {6.6:8.8, 9.9:10.1}   
  {7.7:1.1, 2.2:3.3}      {3.12000:1.23000, 2.34000:5.67000}      
{"2023-09-08":"2024-09-08", "1023-09-08":"2023-09-08"}  {"":"2023-09-08 
17:12:34.123456", "3023-09-08 17:12:34.123456":"4023-09-08 17:12:34.123456"}    
{"char":"char2", "char2":"char3"}       {"varchar":"varchar2", 
"varchar3":"varchar4"}   {"string":"string2", "string3":"string4"}
+1      {1:1, 0:1}      {1:2, 3:4}      {2:4, 5:6}      {3:4, 7:8}      {4:5, 
1:2}      {123456789:987654321, 789456123:456789123}      {6.6:8.8, 9.9:10.1}   
  {7.7:1.1, 2.2:3.3}      {3.12000:1.23000, 2.34000:5.67000}      
{"2023-09-08":"2024-09-08", "1023-09-08":"2023-09-08"}  {"":"2023-09-08 
17:12:34.123456", "3023-09-08 17:12:34.123456":"4023-09-08 17:12:34.123456"}    
{"char":"char2", "char2":"char3"}       {"varchar":"varchar2", 
"varchar3":"varchar4"}   {"string":"string2", "string3":"string4"}
+1      {1:1, 0:1}      {1:2, 3:4}      {2:4, 5:6}      {3:4, 7:8}      {4:5, 
1:2}      {123456789:987654321, 789456123:456789123}      {6.6:8.8, 9.9:10.1}   
  {7.7:1.1, 2.2:3.3}      {3.12000:1.23000, 2.34000:5.67000}      
{"2023-09-08":"2024-09-08", "1023-09-08":"2023-09-08"}  {"":"2023-09-08 
17:12:34.123456", "3023-09-08 17:12:34.123456":"4023-09-08 17:12:34.123456"}    
{"char":"char2", "char2":"char3"}       {"varchar":"varchar2", 
"varchar3":"varchar4"}   {"string":"string2", "string3":"string4"}
+1      {1:1, 0:1}      {1:2, 3:4}      {2:4, 5:6}      {3:4, 7:8}      {4:5, 
1:2}      {123456789:987654321, 789456123:456789123}      {6.6:8.8, 9.9:10.1}   
  {7.7:1.1, 2.2:3.3}      {3.12000:1.23000, 2.34000:5.67000}      
{"2023-09-08":"2024-09-08", "1023-09-08":"2023-09-08"}  {"":"2023-09-08 
17:12:34.123456", "3023-09-08 17:12:34.123456":"4023-09-08 17:12:34.123456"}    
{"char":"char2", "char2":"char3"}       {"varchar":"varchar2", 
"varchar3":"varchar4"}   {"string":"string2", "string3":"string4"}
+1      {1:1, 0:1}      {1:2, 3:4}      {2:4, 5:6}      {3:4, 7:8}      {4:5, 
1:2}      {123456789:987654321, 789456123:456789123}      {6.6:8.8, 9.9:10.1}   
  {7.7:1.1, 2.2:3.3}      {3.12000:1.23000, 2.34000:5.67000}      
{"2023-09-08":"2024-09-08", "1023-09-08":"2023-09-08"}  {"":"2023-09-08 
17:12:34.123456", "3023-09-08 17:12:34.123456":"4023-09-08 17:12:34.123456"}    
{"char":"char2", "char2":"char3"}       {"varchar":"varchar2", 
"varchar3":"varchar4"}   {"string":"string2", "string3":"string4"}
+
+-- !q04 --
+1      {"c_bool": 1, "c_tinyint": 1, "c_smallint": 2, "c_int": 3, "c_bigint": 
4, "c_largeint": 123456789, "c_float": 6.6, "c_double": 7.7, "c_decimal": 
3.12000, "c_date": "2023-09-08", "c_datetime": "2023-09-08 17:12:34.123456", 
"c_char": "char", "c_varchar": "varchar", "c_string": "string"}
+1      {"c_bool": 1, "c_tinyint": 1, "c_smallint": 2, "c_int": 3, "c_bigint": 
4, "c_largeint": 123456789, "c_float": 6.6, "c_double": 7.7, "c_decimal": 
3.12000, "c_date": "2023-09-08", "c_datetime": "2023-09-08 17:12:34.123456", 
"c_char": "char", "c_varchar": "varchar", "c_string": "string"}
+1      {"c_bool": 1, "c_tinyint": 1, "c_smallint": 2, "c_int": 3, "c_bigint": 
4, "c_largeint": 123456789, "c_float": 6.6, "c_double": 7.7, "c_decimal": 
3.12000, "c_date": "2023-09-08", "c_datetime": "2023-09-08 17:12:34.123456", 
"c_char": "char", "c_varchar": "varchar", "c_string": "string"}
+1      {"c_bool": 1, "c_tinyint": 1, "c_smallint": 2, "c_int": 3, "c_bigint": 
4, "c_largeint": 123456789, "c_float": 6.6, "c_double": 7.7, "c_decimal": 
3.12000, "c_date": "2023-09-08", "c_datetime": "2023-09-08 17:12:34.123456", 
"c_char": "char", "c_varchar": "varchar", "c_string": "string"}
+1      {"c_bool": 1, "c_tinyint": 1, "c_smallint": 2, "c_int": 3, "c_bigint": 
4, "c_largeint": 123456789, "c_float": 6.6, "c_double": 7.7, "c_decimal": 
3.12000, "c_date": "2023-09-08", "c_datetime": "2023-09-08 17:12:34.123456", 
"c_char": "char", "c_varchar": "varchar", "c_string": "string"}
+1      {"c_bool": 1, "c_tinyint": 1, "c_smallint": 2, "c_int": 3, "c_bigint": 
4, "c_largeint": 123456789, "c_float": 6.6, "c_double": 7.7, "c_decimal": 
3.12000, "c_date": "2023-09-08", "c_datetime": "2023-09-08 17:12:34.123456", 
"c_char": "char", "c_varchar": "varchar", "c_string": "string"}
+1      {"c_bool": 1, "c_tinyint": 1, "c_smallint": 2, "c_int": 3, "c_bigint": 
4, "c_largeint": 123456789, "c_float": 6.6, "c_double": 7.7, "c_decimal": 
3.12000, "c_date": "2023-09-08", "c_datetime": "2023-09-08 17:12:34.123456", 
"c_char": "char", "c_varchar": "varchar", "c_string": "string"}
+
diff --git 
a/regression-test/suites/connector_p0/spark_connector/spark_connector_arrow.groovy
 
b/regression-test/suites/connector_p0/spark_connector/spark_connector_arrow.groovy
new file mode 100644
index 00000000000..1cd2ed31d2e
--- /dev/null
+++ 
b/regression-test/suites/connector_p0/spark_connector/spark_connector_arrow.groovy
@@ -0,0 +1,146 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("spark_connector_for_arrow", "connector") {
+
+    sql """use regression_test_connector_p0_spark_connector"""
+
+    sql """
+      CREATE TABLE IF NOT EXISTS `spark_connector_primitive` (
+        `id` int(11) NOT NULL,
+        `c_bool` boolean NULL,
+        `c_tinyint` tinyint NULL,
+        `c_smallint` smallint NULL,
+        `c_int` int NULL,
+        `c_bigint` bigint NULL,
+        `c_largeint` largeint NULL,
+        `c_float` float NULL,
+        `c_double` double NULL,
+        `c_decimal` DECIMAL(10, 5) NULL,
+        `c_date` date NULL,
+        `c_datetime` datetime(6) NULL,
+        `c_char` char(10) NULL,
+        `c_varchar` varchar(10) NULL,
+        `c_string` string NULL
+      ) ENGINE=OLAP
+      DUPLICATE KEY(`id`)
+      COMMENT 'OLAP'
+      DISTRIBUTED BY HASH(`id`) BUCKETS 1
+      PROPERTIES (
+      "replication_allocation" = "tag.location.default: 1"
+      )"""
+
+    sql """
+      CREATE TABLE IF NOT EXISTS `spark_connector_array` (
+        `id` int(11) NOT NULL,
+        `c_array_boolean` ARRAY<boolean> NULL,
+        `c_array_tinyint` ARRAY<tinyint> NULL,
+        `c_array_smallint` ARRAY<smallint> NULL,
+        `c_array_int` ARRAY<int> NULL,
+        `c_array_bigint` ARRAY<bigint> NULL,
+        `c_array_largeint` ARRAY<largeint> NULL,
+        `c_array_float` ARRAY<float> NULL,
+        `c_array_double` ARRAY<double> NULL,
+        `c_array_decimal` ARRAY<DECIMAL(10, 5)> NULL,
+        `c_array_date` ARRAY<date> NULL,
+        `c_array_datetime` ARRAY<datetime(6)> NULL,
+        `c_array_char` ARRAY<char(10)> NULL,
+        `c_array_varchar` ARRAY<varchar(10)> NULL,
+        `c_array_string` ARRAY<string> NULL
+      ) ENGINE=OLAP
+      DUPLICATE KEY(`id`)
+      COMMENT 'OLAP'
+      DISTRIBUTED BY HASH(`id`) BUCKETS 1
+      PROPERTIES (
+      "replication_allocation" = "tag.location.default: 1"
+      );"""
+
+    sql """
+      CREATE TABLE IF NOT EXISTS `spark_connector_map` (
+        `id` int(11) NOT NULL,
+        `c_map_bool` Map<boolean,boolean> NULL,
+        `c_map_tinyint` Map<tinyint,tinyint> NULL,
+        `c_map_smallint` Map<smallint,smallint> NULL,
+        `c_map_int` Map<int,int> NULL,
+        `c_map_bigint` Map<bigint,bigint> NULL,
+        `c_map_largeint` Map<largeint,largeint> NULL,
+        `c_map_float` Map<float,float> NULL,
+        `c_map_double` Map<double,double> NULL,
+        `c_map_decimal` Map<DECIMAL(10, 5),DECIMAL(10, 5)> NULL,
+        `c_map_date` Map<date,date> NULL,
+        `c_map_datetime` Map<datetime(6),datetime(6)> NULL,
+        `c_map_char` Map<char(10),char(10)> NULL,
+        `c_map_varchar` Map<varchar(10),varchar(10)> NULL,
+        `c_map_string` Map<string,string> NULL
+      ) ENGINE=OLAP
+      DUPLICATE KEY(`id`)
+      COMMENT 'OLAP'
+      DISTRIBUTED BY HASH(`id`) BUCKETS 1
+      PROPERTIES (
+      "replication_allocation" = "tag.location.default: 1"
+      );"""
+
+    sql """
+      CREATE TABLE IF NOT EXISTS `spark_connector_struct` (
+        `id` int NOT NULL,
+        `st` STRUCT<
+            `c_bool`:boolean,
+            `c_tinyint`:tinyint(4),
+            `c_smallint`:smallint(6),
+            `c_int`:int(11),
+            `c_bigint`:bigint(20),
+            `c_largeint`:largeint(40),
+            `c_float`:float,
+            `c_double`:double,
+            `c_decimal`:DECIMAL(10, 5),
+            `c_date`:date,
+            `c_datetime`:datetime(6),
+            `c_char`:char(10),
+            `c_varchar`:varchar(10),
+            `c_string`:string
+          > NULL
+      ) ENGINE=OLAP
+      DUPLICATE KEY(`id`)
+      COMMENT 'OLAP'
+      DISTRIBUTED BY HASH(`id`) BUCKETS 1
+      PROPERTIES (
+      "replication_allocation" = "tag.location.default: 1"
+      );"""
+
+    sql """DELETE FROM spark_connector_primitive where id > 0"""
+    sql """DELETE FROM spark_connector_array where id > 0"""
+    sql """DELETE FROM spark_connector_map where id > 0"""
+    sql """DELETE FROM spark_connector_struct where id > 0"""
+
+    def jar_name = 
"spark-doris-connector-3.1_2.12-1.3.0-SNAPSHOT-with-dependencies.jar"
+
+    logger.info("start delete local spark doris demo jar...")
+    def delete_local_spark_jar = "rm -rf ${jar_name}".execute()
+    logger.info("start download spark doris demo ...")
+    logger.info("getS3Url ==== ${getS3Url()}")
+    def download_spark_jar = "/usr/bin/curl 
${getS3Url()}/regression/${jar_name} --output ${jar_name}".execute().getText()
+    logger.info("finish download spark doris demo ...")
+    def run_cmd = "java -cp ${jar_name} 
org.apache.doris.spark.testcase.TestStreamLoadForArrowType 
$context.config.feHttpAddress $context.config.feHttpUser 
regression_test_connector_p0_spark_connector"
+    logger.info("run_cmd : $run_cmd")
+    def run_spark_jar = run_cmd.execute().getText()
+    logger.info("result: $run_spark_jar")
+
+    qt_q01 """ select * from spark_connector_primitive """
+    qt_q02 """ select * from spark_connector_array """
+    qt_q03 """ select * from spark_connector_map """
+    qt_q04 """ select * from spark_connector_struct """
+}


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris) branch master updated: [fix](stream load)add test case and doc for arrow type of stream load (#28098)

Reply via email to