This is an automated email from the ASF dual-hosted git repository.
dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 4eb9a52acef [regression](s3load) Add regression testing and modify
description text for s3load (#25947)
4eb9a52acef is described below
commit 4eb9a52acefab7daa76b5ddd6d05188c436b2731
Author: Zhiyu Hu <[email protected]>
AuthorDate: Wed Nov 1 07:39:16 2023 +0800
[regression](s3load) Add regression testing and modify description text for
s3load (#25947)
---
.../Load/BROKER-LOAD.md | 16 +-
.../Load/BROKER-LOAD.md | 16 +-
.../test_csv_with_enclose_and_escapeS3_load.out | 11 ++
.../test_csv_with_enclose_and_escapeS3_load.groovy | 195 +++++++++++++++++++++
4 files changed, 222 insertions(+), 16 deletions(-)
diff --git
a/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
b/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
index 4869c988bd8..6bb4eac3e23 100644
---
a/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
+++
b/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
@@ -143,6 +143,14 @@ WITH BROKER broker_name
Specify some parameters of the imported format. For example, if the
imported file is in `json` format, you can specify parameters such as
`json_root`, `jsonpaths`, `fuzzy parse`, etc.
+ - <version since="dev" type="inline"> enclose </version>
+
+ When the csv data field contains row delimiters or column delimiters,
to prevent accidental truncation, single-byte characters can be specified as
brackets for protection. For example, the column separator is ",", the bracket
is "'", and the data is "a,'b,c'", then "b,c" will be parsed as a field.
+
+ - <version since="dev" type="inline"> escape </version>
+
+ Used to escape characters that appear in a csv field identical to the
enclosing characters. For example, if the data is "a,'b,'c'", enclose is "'",
and you want "b,'c to be parsed as a field, you need to specify a single-byte
escape character, such as "\", and then modify the data to "a,' b,\'c'".
+
- `WITH BROKER broker_name`
Specify the Broker service name to be used. In the public cloud Doris.
Broker service name is `bos`
@@ -204,14 +212,6 @@ WITH BROKER broker_name
Set the priority of the load job, there are three options:
`HIGH/NORMAL/LOW`, use `NORMAL` priority as default. The pending broker load
jobs which have higher priority will be chosen to execute earlier.
- - <version since="dev" type="inline"> enclose </version>
-
- When the csv data field contains row delimiters or column delimiters, to
prevent accidental truncation, single-byte characters can be specified as
brackets for protection. For example, the column separator is ",", the bracket
is "'", and the data is "a,'b,c'", then "b,c" will be parsed as a field.
-
- - <version since="dev" type="inline"> escape </version>
-
- Used to escape characters that appear in a csv field identical to the
enclosing characters. For example, if the data is "a,'b,'c'", enclose is "'",
and you want "b,'c to be parsed as a field, you need to specify a single-byte
escape character, such as "\", and then modify the data to "a,' b,\'c'".
-
- <version since="1.2.3" type="inline"> comment </version>
Specify the comment for the import job. The comment can be viewed in the
`show load` statement.
diff --git
a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
index d30d25baa75..7f13fca55ab 100644
---
a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
+++
b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Load/BROKER-LOAD.md
@@ -143,6 +143,14 @@ WITH BROKER broker_name
指定导入的format的一些参数。如导入的文件是`json`格式,则可以在这里指定`json_root`、`jsonpaths`、`fuzzy_parse`等参数。
+ - <version since="dev" type="inline"> enclose </version>
+
+
包围符。当csv数据字段中含有行分隔符或列分隔符时,为防止意外截断,可指定单字节字符作为包围符起到保护作用。例如列分隔符为",",包围符为"'",数据为"a,'b,c'",则"b,c"会被解析为一个字段。
+
+ - <version since="dev" type="inline"> escape </version>
+
+
转义符。用于转义在字段中出现的与包围符相同的字符。例如数据为"a,'b,'c'",包围符为"'",希望"b,'c被作为一个字段解析,则需要指定单字节转义符,例如"\",然后将数据修改为"a,'b,\'c'"。
+
- `WITH BROKER broker_name`
指定需要使用的 Broker 服务名称。在公有云 Doris 中。Broker 服务名称为 `bos`
@@ -203,14 +211,6 @@ WITH BROKER broker_name
设置导入任务的优先级,可选 `HIGH/NORMAL/LOW` 三种优先级,默认为 `NORMAL`,对于处在 `PENDING`
状态的导入任务,更高优先级的任务将优先被执行进入 `LOADING` 状态。
- - <version since="dev" type="inline"> enclose </version>
-
-
包围符。当csv数据字段中含有行分隔符或列分隔符时,为防止意外截断,可指定单字节字符作为包围符起到保护作用。例如列分隔符为",",包围符为"'",数据为"a,'b,c'",则"b,c"会被解析为一个字段。
-
- - <version since="dev" type="inline"> escape </version>
-
-
转义符。用于转义在字段中出现的与包围符相同的字符。例如数据为"a,'b,'c'",包围符为"'",希望"b,'c被作为一个字段解析,则需要指定单字节转义符,例如"\",然后将数据修改为"a,'b,\'c'"。
-
- <version since="1.2.3" type="inline"> comment </version>
指定导入任务的备注信息。可选参数。
diff --git
a/regression-test/data/load_p0/broker_load/test_csv_with_enclose_and_escapeS3_load.out
b/regression-test/data/load_p0/broker_load/test_csv_with_enclose_and_escapeS3_load.out
new file mode 100644
index 00000000000..53aea0d8f89
--- /dev/null
+++
b/regression-test/data/load_p0/broker_load/test_csv_with_enclose_and_escapeS3_load.out
@@ -0,0 +1,11 @@
+-- This file is automatically generated. You should know what you did if you
want to edit this
+-- !select --
+1 abc,d\nef, 2023-07-15 ,,\nghi,j 2023-07-20:05:48:31
kl,mn,\n
+10 ab@@cd@@efg 2023-07-20 ab@@cd$$$efg 2023-07-20:05:48:31
@@ab$$$cd$$$
+10 ab@@cd@@efg 2023-07-20 ab@@cd$$$efg 2023-07-20:05:48:31
@@ab$$$cd$$$
+10 abc "2023-07-15" "def" "2023-07-20:05:48:31" "ghi"
+11 "abc,def" 2023-07-15 ghi 2023-07-20:05:48:31
"jkl\nmne"
+2 abc 2023-07-15 def 2023-07-20:05:48:31 ghi
+3 abc"de,fg"h 2023-07-15 i\nj,k\n" 2023-07-20:05:48:31
ghi
+6 ab"c 2023-07-20 d"ef" 2023-07-20:05:48:31 "g"hi
+7 aaa 2023-07-20 2023-07-20:05:48:31
diff --git
a/regression-test/suites/load_p0/broker_load/test_csv_with_enclose_and_escapeS3_load.groovy
b/regression-test/suites/load_p0/broker_load/test_csv_with_enclose_and_escapeS3_load.groovy
new file mode 100644
index 00000000000..a761ad1a211
--- /dev/null
+++
b/regression-test/suites/load_p0/broker_load/test_csv_with_enclose_and_escapeS3_load.groovy
@@ -0,0 +1,195 @@
+
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+suite("test_csv_with_enclose_and_escapeS3_load", "load_p0") {
+
+ def tableName = "test_csv_with_enclose_and_escape"
+
+ sql """ DROP TABLE IF EXISTS ${tableName} """
+ sql """
+ CREATE TABLE IF NOT EXISTS ${tableName}(
+ `k1` VARCHAR(44) NOT NULL,
+ `k2` VARCHAR(44) NOT NULL,
+ `v1` VARCHAR(44) NOT NULL,
+ `v2` VARCHAR(44) NOT NULL,
+ `v3` VARCHAR(44) NOT NULL,
+ `v4` VARCHAR(44) NOT NULL
+ ) ENGINE=OLAP
+ DUPLICATE KEY(`k1`)
+ COMMENT 'OLAP'
+ DISTRIBUTED BY HASH(`k1`) BUCKETS 3
+ PROPERTIES ("replication_allocation" = "tag.location.default: 1");
+ """
+
+ def normalCases = [
+ 'enclose_normal',
+ 'enclose_with_escape',
+ 'enclose_wrong_position',
+ 'enclose_empty_values'
+ ]
+
+ def attributesList = [
+
+ ]
+
+ for (i in 0..<normalCases.size()) {
+ attributesList.add(new
LoadAttributes("s3://doris-build-1308700295/regression/load/data/${normalCases[i]}.csv",
+ "${tableName}", "LINES TERMINATED BY \"\n\"", "COLUMNS
TERMINATED BY \",\"", "FORMAT AS \"CSV\"", "(k1,k2,v1,v2,v3,v4)",
+ "PROPERTIES (\"enclose\" = \"\\\"\", \"escape\" = \"\\\\\",
\"trim_double_quotes\" = \"true\")"))
+ }
+
+ attributesList.add(new
LoadAttributes("s3://doris-build-1308700295/regression/load/data/enclose_incomplete.csv",
+ "${tableName}", "LINES TERMINATED BY \"\n\"", "COLUMNS TERMINATED BY
\",\"", "FORMAT AS \"CSV\"", "(k1,k2,v1,v2,v3,v4)",
+ "PROPERTIES (\"enclose\" = \"\\\"\", \"escape\" = \"\\\\\",
\"trim_double_quotes\" = \"true\")").addProperties("max_filter_ratio", "0.5"))
+
+ attributesList.add(new
LoadAttributes("s3://doris-build-1308700295/regression/load/data/enclose_without_escape.csv",
+ "${tableName}", "LINES TERMINATED BY \"\n\"", "COLUMNS TERMINATED BY
\",\"", "FORMAT AS \"CSV\"", "(k1,k2,v1,v2,v3,v4)",
+ "PROPERTIES (\"enclose\" = \"\\\"\", \"escape\" = \"\\\\\",
\"trim_double_quotes\" = \"true\")"))
+
+ attributesList.add(new
LoadAttributes("s3://doris-build-1308700295/regression/load/data/enclose_multi_char_delimiter.csv",
+ "${tableName}", "LINES TERMINATED BY \"\$\$\$\"", "COLUMNS TERMINATED
BY \"@@\"", "FORMAT AS \"CSV\"", "(k1,k2,v1,v2,v3,v4)",
+ "PROPERTIES (\"enclose\" = \"\\\"\", \"escape\" = \"\\\\\",
\"trim_double_quotes\" = \"true\")"))
+
+ attributesList.add(new
LoadAttributes("s3://doris-build-1308700295/regression/load/data/enclose_not_trim_quotes.csv",
+ "${tableName}", "", "COLUMNS TERMINATED BY \",\"", "FORMAT AS
\"CSV\"", "(k1,k2,v1,v2,v3,v4)",
+ "PROPERTIES (\"enclose\" = \"\\\"\", \"escape\" =
\"\\\\\")").addProperties("trim_double_quotes", "false"))
+
+ def ak = getS3AK()
+ def sk = getS3SK()
+
+
+ def i = 0
+ for (LoadAttributes attributes : attributesList) {
+ def label = "test_s3_load_escape_enclose" +
UUID.randomUUID().toString().replace("-", "_") + "_" + i
+ attributes.label = label
+ def prop = attributes.getPropertiesStr()
+
+ sql """
+ LOAD LABEL $label (
+ DATA INFILE("$attributes.dataDesc.path")
+ INTO TABLE $attributes.dataDesc.tableName
+ $attributes.dataDesc.columnTermClause
+ $attributes.dataDesc.lineTermClause
+ $attributes.dataDesc.formatClause
+ $attributes.dataDesc.columns
+ $attributes.dataDesc.whereExpr
+ )
+ WITH S3 (
+ "AWS_ACCESS_KEY" = "$ak",
+ "AWS_SECRET_KEY" = "$sk",
+ "AWS_ENDPOINT" = "cos.ap-beijing.myqcloud.com",
+ "AWS_REGION" = "ap-beijing"
+ )
+ ${prop}
+ """
+
+ def max_try_milli_secs = 600000
+ while (max_try_milli_secs > 0) {
+ String[][] result = sql """ show load where
label="$attributes.label" order by createtime desc limit 1; """
+
+ if (result[0][2].equals("FINISHED")) {
+ if (attributes.isExceptFailed) {
+ assertTrue(false, "load should be failed but was success:
$result")
+ }
+ logger.info("Load FINISHED " + attributes.label + ": $result")
+ break
+ }
+ if (result[0][2].equals("CANCELLED")) {
+ if (attributes.dataDesc.path.split("/")[-1] ==
"enclose_incomplete.csv" || attributes.dataDesc.path.split("/")[-1] ==
"enclose_without_escape.csv") {
+ break
+ }
+ if (attributes.isExceptFailed) {
+ logger.info("Load FINISHED " + attributes.label)
+ break
+ }
+ assertTrue(false, "load failed: $result")
+ break
+ }
+ Thread.sleep(1000)
+ max_try_milli_secs -= 1000
+ if (max_try_milli_secs <= 0) {
+ assertTrue(false, "load Timeout: $attributes.label")
+ }
+ }
+ }
+ sql "sync"
+ qt_select """
+ SELECT * FROM ${tableName} ORDER BY k1, k2
+ """
+}
+
+class DataDesc {
+ public String mergeType = ""
+ public String path
+ public String tableName
+ public String lineTermClause
+ public String columnTermClause
+ public String formatClause
+ public String columns
+ public String whereExpr
+}
+
+class LoadAttributes {
+ LoadAttributes(String path, String tableName, String lineTermClause,
String columnTermClause, String formatClause,
+ String columns, String whereExpr, boolean isExceptFailed =
false) {
+ this.dataDesc = new DataDesc()
+ this.dataDesc.path = path
+ this.dataDesc.tableName = tableName
+ this.dataDesc.lineTermClause = lineTermClause
+ this.dataDesc.columnTermClause = columnTermClause
+ this.dataDesc.formatClause = formatClause
+ this.dataDesc.columns = columns
+ this.dataDesc.whereExpr = whereExpr
+
+ this.isExceptFailed = isExceptFailed
+
+ properties = new HashMap<>()
+ properties.put("use_new_load_scan_node", "true")
+ }
+
+ LoadAttributes addProperties(String k, String v) {
+ properties.put(k, v)
+ return this
+ }
+
+ String getPropertiesStr() {
+ if (properties.isEmpty()) {
+ return ""
+ }
+ String prop = "PROPERTIES ("
+ properties.forEach (k, v) -> {
+ prop += "\"${k}\" = \"${v}\","
+ }
+ prop = prop.substring(0, prop.size() - 1)
+ prop += ")"
+ return prop
+ }
+
+ LoadAttributes withPathStyle() {
+ usePathStyle = "true"
+ return this
+ }
+
+ public DataDesc dataDesc
+ public Map<String, String> properties
+ public String label
+ public String usePathStyle = "false"
+ public boolean isExceptFailed
+
+
+}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]