imay commented on a change in pull request #1569: Enable Partition Discovery for Broker Load URL: https://github.com/apache/incubator-doris/pull/1569#discussion_r310419374
########## File path: docs/help/Contents/Data Manipulation/broker_load.md ########## @@ -352,15 +357,47 @@ ) WITH BROKER hdfs ("username"="hdfs_user", "password"="hdfs_password"); - 8. 导入Parquet文件中数据 指定FORMAT 为parquet, 默认是通过文件后缀判断 + 8. 导入Parquet文件中数据 指定FORMAT 为parquet, 默认是通过文件后缀判断 LOAD LABEL example_db.label9 ( DATA INFILE("hdfs://hdfs_host:hdfs_port/user/palo/data/input/file") INTO TABLE `my_table` FORMAT AS "parquet" (k1, k2, k3) ) - WITH BROKER hdfs ("username"="hdfs_user", "password"="hdfs_password"); + WITH BROKER hdfs ("username"="hdfs_user", "password"="hdfs_password"); + + 9. 通过Partition Discovery提取文件路径中的压缩字段 + 如果导入路径为目录,则递归地列出该目录下的所有parquet文件 + 如果需要,则会根据表中定义的字段类型解析文件路径中的partitioned fields,实现类似Spark中读parquet文件 + 1. 不指定Partition Discovery的基础路径(BASE_PATH) + LOAD LABEL example_db.label10 + ( + DATA INFILE("hdfs://hdfs_host:hdfs_port/user/palo/data/input/dir") + INTO TABLE `my_table` + FORMAT AS "parquet" + (k1, k2, k3) + ) + WITH BROKER hdfs ("username"="hdfs_user", "password"="hdfs_password"); + + hdfs://hdfs_host:hdfs_port/user/palo/data/input/dir目录下包括如下文件:[hdfs://hdfs_host:hdfs_port/user/palo/data/input/dir/k1=key1/xxx.parquet, hdfs://hdfs_host:hdfs_port/user/palo/data/input/dir/k1=key2/xxx.parquet, ...] + 则会从文件path中提取k1对应的partitioned field的值,并完成数据导入 + + 2. 指定Partition Discovery的基础路径(BASE_PATH) + LOAD LABEL example_db.label11 + ( + DATA INFILE("hdfs://hdfs_host:hdfs_port/user/palo/data/input/dir/city=beijing/utc_date=2019-06-26") + INTO TABLE `my_table` + FORMAT AS "csv" + BASE_PATH AS "hdfs://hdfs_host:hdfs_port/user/palo/data/input/dir/" + (k1, k2, k3, utc_date,city) Review comment: 这里columns表示的一般是文件中包含的列名,如果加上utc_date, city可能会让用户感到困惑。 建议不要再这个地方声明partition列名 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@doris.apache.org For additional commands, e-mail: dev-h...@doris.apache.org