[GitHub] [flink] RocMarshal commented on a change in pull request #18655: [FLINK-25799] [docs] Translate table/filesystem.md page into Chinese.

GitBox Fri, 25 Feb 2022 20:51:43 -0800


RocMarshal commented on a change in pull request #18655:
URL: https://github.com/apache/flink/pull/18655#discussion_r815266179




##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -43,37 +42,36 @@ CREATE TABLE MyUserTable (
   part_name1 INT,
   part_name2 STRING
 ) PARTITIONED BY (part_name1, part_name2) WITH (
-  'connector' = 'filesystem',           -- required: specify the connector
-  'path' = 'file:///path/to/whatever',  -- required: path to a directory
-  'format' = '...',                     -- required: file system connector 
requires to specify a format,
-                                        -- Please refer to Table Formats
-                                        -- section for more details
-  'partition.default-name' = '...',     -- optional: default partition name in 
case the dynamic partition
-                                        -- column value is null/empty string
-
-  -- optional: the option to enable shuffle data by dynamic partition fields 
in sink phase, this can greatly
-  -- reduce the number of file for filesystem sink but may lead data skew, the 
default value is false.
+  'connector' = 'filesystem',           -- 必选：指定连接器类型
+  'path' = 'file:///path/to/whatever',  -- 必选：指定路径
+  'format' = '...',                     -- 必选：文件系统连接器指定格式
+                                        -- 有关更多详情，请参考 Table Formats
+  'partition.default-name' = '...',     -- 可选：默认的分区名，动态分区模式下分区字段值是 null 或空字符串
+
+  -- 可选：该选项开启了在 sink 阶段通过动态分区字段来 shuffle 数据，该功能可以大大减少文件系统 sink 
的文件数，但是可能会导致数据倾斜，默认值是 false
   'sink.shuffle-by-partition.enable' = '...',
   ...
 )
 ```
 
 {{< hint info >}}
-Make sure to include [Flink File System specific dependencies]({{< ref 
"docs/deployment/filesystems/overview" >}}).
+请确保包含 [Flink File System specific dependencies]({{< ref 
"docs/deployment/filesystems/overview" >}}) 。
 {{< /hint >}}
 
 {{< hint info >}}
-File system sources for streaming is still under development. In the future, 
the community will add support for common streaming use cases, i.e., partition 
and directory monitoring.
+基于流的文件系统 sources 仍在开发中。未来，社区将增加对常见的流式用例的支持，例如，对分区和目录的监控等。
 {{< /hint >}}
 
 {{< hint warning >}}
-The behaviour of file system connector is much different from `previous legacy 
filesystem connector`:
-the path parameter is specified for a directory not for a file and you can't 
get a human-readable file in the path that you declare.
+文件系统连接器的行为与 `previous legacy filesystem connector` 有很大不同：

Review comment:
       文件系统连接器的特性与 `previous legacy filesystem connector` 有很大不同：

##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -43,37 +42,36 @@ CREATE TABLE MyUserTable (
   part_name1 INT,
   part_name2 STRING
 ) PARTITIONED BY (part_name1, part_name2) WITH (
-  'connector' = 'filesystem',           -- required: specify the connector
-  'path' = 'file:///path/to/whatever',  -- required: path to a directory
-  'format' = '...',                     -- required: file system connector 
requires to specify a format,
-                                        -- Please refer to Table Formats
-                                        -- section for more details
-  'partition.default-name' = '...',     -- optional: default partition name in 
case the dynamic partition
-                                        -- column value is null/empty string
-
-  -- optional: the option to enable shuffle data by dynamic partition fields 
in sink phase, this can greatly
-  -- reduce the number of file for filesystem sink but may lead data skew, the 
default value is false.
+  'connector' = 'filesystem',           -- 必选：指定连接器类型
+  'path' = 'file:///path/to/whatever',  -- 必选：指定路径
+  'format' = '...',                     -- 必选：文件系统连接器指定格式
+                                        -- 有关更多详情，请参考 Table Formats
+  'partition.default-name' = '...',     -- 可选：默认的分区名，动态分区模式下分区字段值是 null 或空字符串
+
+  -- 可选：该选项开启了在 sink 阶段通过动态分区字段来 shuffle 数据，该功能可以大大减少文件系统 sink 
的文件数，但是可能会导致数据倾斜，默认值是 false
   'sink.shuffle-by-partition.enable' = '...',
   ...
 )
 ```
 
 {{< hint info >}}
-Make sure to include [Flink File System specific dependencies]({{< ref 
"docs/deployment/filesystems/overview" >}}).
+请确保包含 [Flink File System specific dependencies]({{< ref 
"docs/deployment/filesystems/overview" >}}) 。
 {{< /hint >}}
 
 {{< hint info >}}
-File system sources for streaming is still under development. In the future, 
the community will add support for common streaming use cases, i.e., partition 
and directory monitoring.
+基于流的文件系统 sources 仍在开发中。未来，社区将增加对常见的流式用例的支持，例如，对分区和目录的监控等。
 {{< /hint >}}
 
 {{< hint warning >}}
-The behaviour of file system connector is much different from `previous legacy 
filesystem connector`:
-the path parameter is specified for a directory not for a file and you can't 
get a human-readable file in the path that you declare.
+文件系统连接器的行为与 `previous legacy filesystem connector` 有很大不同：
+path 属性指定的是目录，而不是文件，该目录下的文件也不是肉眼可读的。
 {{< /hint >}}
 
-## Partition Files
+<a name="partition-files"></a>
 
-Flink's file system partition support uses the standard hive format. However, 
it does not require partitions to be pre-registered with a table catalog. 
Partitions are discovered and inferred based on directory structure. For 
example, a table partitioned based on the directory below would be inferred to 
contain `datetime` and `hour` partitions.
+## 分区文件
+
+Flink 的文件系统连接器支持分区，使用了标准的 hive 格式。但是，不需要预先注册分区，而是基于目录结构自动做了分区发现。 
例如，根据下面的目录结构，进行分区的表将被推断为包含 `datetime` 和 `hour` 分区。

Review comment:
       ```suggestion
   Flink 的文件系统连接器支持分区，使用了标准的 hive 格式。但是，不需要预先注册分区到 table 
catalog，而是基于目录结构自动做了分区发现。 例如，根据下面的目录结构，分区表将被推断包含 `datetime` 和 `hour` 分区。
   ```

##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -88,95 +86,101 @@ path
         ├── part-0.parquet
 ```
 
-The file system table supports both partition inserting and overwrite 
inserting. See [INSERT Statement]({{< ref "docs/dev/table/sql/insert" >}}). 
When you insert overwrite to a partitioned table, only the corresponding 
partition will be overwritten, not the entire table.
+文件系统表支持分区新增插入和分区覆盖插入。请参考 [INSERT Statement]({{< ref 
"docs/dev/table/sql/insert" >}}) 。当对分区表进行分区覆盖插入时，只有相应的分区会被覆盖，而不是整个表。
+
+<a name="file-formats"></a>
 
-## File Formats
+## 文件格式

Review comment:
       IMO,  keep the original text and keep ‘Format(s)’ consistent in the 
whole article.
   cc @wuchong @Thesharing 

##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -88,95 +86,101 @@ path
         ├── part-0.parquet
 ```
 
-The file system table supports both partition inserting and overwrite 
inserting. See [INSERT Statement]({{< ref "docs/dev/table/sql/insert" >}}). 
When you insert overwrite to a partitioned table, only the corresponding 
partition will be overwritten, not the entire table.
+文件系统表支持分区新增插入和分区覆盖插入。请参考 [INSERT Statement]({{< ref 
"docs/dev/table/sql/insert" >}}) 。当对分区表进行分区覆盖插入时，只有相应的分区会被覆盖，而不是整个表。
+
+<a name="file-formats"></a>
 
-## File Formats
+## 文件格式
 
-The file system connector supports multiple formats:
+文件系统连接器支持多种格式：
 
-- CSV: [RFC-4180](https://tools.ietf.org/html/rfc4180). Uncompressed.
-- JSON: Note JSON format for file system connector is not a typical JSON file 
but uncompressed [newline delimited JSON](http://jsonlines.org/).
-- Avro: [Apache Avro](http://avro.apache.org). Support compression by 
configuring `avro.codec`.
-- Parquet: [Apache Parquet](http://parquet.apache.org). Compatible with Hive.
-- Orc: [Apache Orc](http://orc.apache.org). Compatible with Hive.
-- Debezium-JSON: [debezium-json]({{< ref 
"docs/connectors/table/formats/debezium" >}}).
-- Canal-JSON: [canal-json]({{< ref "docs/connectors/table/formats/canal" >}}).
-- Raw: [raw]({{< ref "docs/connectors/table/formats/raw" >}}).
+- CSV：[RFC-4180](https://tools.ietf.org/html/rfc4180) 。非压缩格式。
+- JSON：注意，文件系统连接器的 JSON 格式不是传统的标准的 JSON 格式，而是非压缩的。[换行符分割的 
JSON](http://jsonlines.org/) 。
+- Avro：[Apache Avro](http://avro.apache.org) 。通过配置 `avro.codec` 属性支持压缩。
+- Parquet：[Apache Parquet](http://parquet.apache.org) 。兼容 hive。
+- Orc：[Apache Orc](http://orc.apache.org) 。兼容 hive。
+- Debezium-JSON：[debezium-json]({{< ref 
"docs/connectors/table/formats/debezium" >}}) 。
+- Canal-JSON：[canal-json]({{< ref "docs/connectors/table/formats/canal" >}}) 。
+- Raw：[raw]({{< ref "docs/connectors/table/formats/raw" >}}) 。

Review comment:
       ```suggestion
   - CSV：[RFC-4180](https://tools.ietf.org/html/rfc4180)。非压缩格式。
   - JSON：注意，文件系统连接器的 JSON 格式不是传统的标准的 JSON 格式，而是非压缩的。[换行符分割的 
JSON](http://jsonlines.org/)。
   - Avro：[Apache Avro](http://avro.apache.org)。通过配置 `avro.codec` 属性支持压缩。
   - Parquet：[Apache Parquet](http://parquet.apache.org)。兼容 hive。
   - Orc：[Apache Orc](http://orc.apache.org)。兼容 hive。
   - Debezium-JSON：[debezium-json]({{< ref 
"docs/connectors/table/formats/debezium" >}})。
   - Canal-JSON：[canal-json]({{< ref "docs/connectors/table/formats/canal" >}})。
   - Raw：[raw]({{< ref "docs/connectors/table/formats/raw" >}})。
   ```

##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -43,37 +42,36 @@ CREATE TABLE MyUserTable (
   part_name1 INT,
   part_name2 STRING
 ) PARTITIONED BY (part_name1, part_name2) WITH (
-  'connector' = 'filesystem',           -- required: specify the connector
-  'path' = 'file:///path/to/whatever',  -- required: path to a directory
-  'format' = '...',                     -- required: file system connector 
requires to specify a format,
-                                        -- Please refer to Table Formats
-                                        -- section for more details
-  'partition.default-name' = '...',     -- optional: default partition name in 
case the dynamic partition
-                                        -- column value is null/empty string
-
-  -- optional: the option to enable shuffle data by dynamic partition fields 
in sink phase, this can greatly
-  -- reduce the number of file for filesystem sink but may lead data skew, the 
default value is false.
+  'connector' = 'filesystem',           -- 必选：指定连接器类型
+  'path' = 'file:///path/to/whatever',  -- 必选：指定路径
+  'format' = '...',                     -- 必选：文件系统连接器指定格式
+                                        -- 有关更多详情，请参考 Table Formats
+  'partition.default-name' = '...',     -- 可选：默认的分区名，动态分区模式下分区字段值是 null 或空字符串
+
+  -- 可选：该选项开启了在 sink 阶段通过动态分区字段来 shuffle 数据，该功能可以大大减少文件系统 sink 
的文件数，但是可能会导致数据倾斜，默认值是 false
   'sink.shuffle-by-partition.enable' = '...',
   ...
 )
 ```
 
 {{< hint info >}}
-Make sure to include [Flink File System specific dependencies]({{< ref 
"docs/deployment/filesystems/overview" >}}).
+请确保包含 [Flink File System specific dependencies]({{< ref 
"docs/deployment/filesystems/overview" >}}) 。
 {{< /hint >}}
 
 {{< hint info >}}
-File system sources for streaming is still under development. In the future, 
the community will add support for common streaming use cases, i.e., partition 
and directory monitoring.
+基于流的文件系统 sources 仍在开发中。未来，社区将增加对常见的流式用例的支持，例如，对分区和目录的监控等。
 {{< /hint >}}
 
 {{< hint warning >}}
-The behaviour of file system connector is much different from `previous legacy 
filesystem connector`:
-the path parameter is specified for a directory not for a file and you can't 
get a human-readable file in the path that you declare.
+文件系统连接器的行为与 `previous legacy filesystem connector` 有很大不同：
+path 属性指定的是目录，而不是文件，该目录下的文件也不是肉眼可读的。

Review comment:
       Maybe you could translate it in a better way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] RocMarshal commented on a change in pull request #18655: [FLINK-25799] [docs] Translate table/filesystem.md page into Chinese.

Reply via email to