[GitHub] [incubator-doris] wyb commented on a change in pull request #4463: [Doc] Add spark load sql statement doc and update manual

GitBox Thu, 27 Aug 2020 09:08:39 -0700


wyb commented on a change in pull request #4463:
URL: https://github.com/apache/incubator-doris/pull/4463#discussion_r478532957




##########
File path: docs/zh-CN/sql-reference/sql-statements/Data Manipulation/SPARK 
LOAD.md
##########
@@ -0,0 +1,263 @@
+---
+{
+    "title": "SPARK LOAD",
+    "language": "zh-CN"
+}
+---
+
+<!-- 
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# SPARK LOAD
+## description
+
+    Spark load 通过外部的 Spark 资源实现对导入数据的预处理，提高 Doris 大数据量的导入性能并且节省 Doris 
集群的计算资源。主要用于初次迁移，大数据量导入 Doris 的场景。
+
+    Spark load 是一种异步导入方式，用户需要通过 MySQL 协议创建 Spark 类型导入任务，并通过 `SHOW LOAD` 查看导入结果。
+
+语法：
+
+    LOAD LABEL load_label
+    (
+    data_desc1[, data_desc2, ...]
+    )
+    WITH RESOURCE resource_name
+    [resource_properties]
+    [opt_properties];
+
+    1. load_label
+
+        当前导入批次的标签。在一个 database 内唯一。
+        语法：
+        [database_name.]your_label
+     
+    2. data_desc
+
+        用于描述一批导入数据。
+        语法：
+            DATA INFILE
+            (
+            "file_path1"[, file_path2, ...]
+            )
+            [NEGATIVE]
+            INTO TABLE `table_name`
+            [PARTITION (p1, p2)]
+            [COLUMNS TERMINATED BY "column_separator"]
+            [FORMAT AS "file_type"]
+            [(column_list)]
+            [COLUMNS FROM PATH AS (col2, ...)]
+            [SET (k1 = func(k2))]
+            [WHERE predicate]    
+
+            DATA FROM TABLE hive_external_tbl
+            [NEGATIVE]
+            INTO TABLE tbl_name
+            [PARTITION (p1, p2)]
+            [SET (k1=f1(xx), k2=f2(xx))]
+            [WHERE predicate]
+
+        说明：
+            file_path: 
+
+            文件路径，可以指定到一个文件，也可以用 * 通配符指定某个目录下的所有文件。通配符必须匹配到文件，而不能是目录。
+
+            hive_external_tbl:
+
+            hive 外部表名。
+            要求导入的 doris 表中的列必须在 hive 外部表中存在。
+            每个导入任务只支持从一个 hive 外部表导入。
+            不能与 file_path 方式同时使用。
+
+            PARTITION:
+
+            如果指定此参数，则只会导入指定的分区，导入分区以外的数据会被过滤掉。
+            如果不指定，默认导入table的所有分区。
+        
+            NEGATIVE：
+
+            如果指定此参数，则相当于导入一批“负”数据。用于抵消之前导入的同一批数据。
+            该参数仅适用于存在 value 列，并且 value 列的聚合类型仅为 SUM 的情况。
+            
+            column_separator：
+
+            用于指定导入文件中的列分隔符。默认为 \t
+            如果是不可见字符，则需要加\\x作为前缀，使用十六进制来表示分隔符。
+            如hive文件的分隔符\x01，指定为"\\x01"
+            
+            file_type：
+
+            用于指定导入文件的类型，目前仅支持csv。 
+ 
+            column_list：
+
+            用于指定导入文件中的列和 table 中的列的对应关系。
+            当需要跳过导入文件中的某一列时，将该列指定为 table 中不存在的列名即可。
+            语法：
+            (col_name1, col_name2, ...)
+            
+            SET:
+
+            如果指定此参数，可以将源文件某一列按照函数进行转化，然后将转化后的结果导入到table中。语法为 `column_name` = 
expression。举几个例子帮助理解。

Review comment:
       Already supported




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-doris] wyb commented on a change in pull request #4463: [Doc] Add spark load sql statement doc and update manual

Reply via email to