[GitHub] [flink] michaelli916 commented on a change in pull request #13459: [FLINK-18199] [chinese-translation]: translate FileSystem SQL Connector page into Chinese

GitBox Mon, 26 Apr 2021 00:11:57 -0700


michaelli916 commented on a change in pull request #13459:
URL: https://github.com/apache/flink/pull/13459#discussion_r619767625




##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -24,15 +24,13 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# FileSystem SQL Connector
+# 文件系统 SQL 连接器 
 
-This connector provides access to partitioned files in filesystems
-supported by the [Flink FileSystem abstraction]({{< ref 
"docs/deployment/filesystems/overview" >}}).
+该连接器提供了对 [Flink 文件系统抽象]({{< ref "docs/deployment/filesystems/overview" >}}) 
支持的文件系统中的分区文件的访问.
 
-The file system connector itself is included in Flink and does not require an 
additional dependency.
-A corresponding format needs to be specified for reading and writing rows from 
and to a file system.
+文件系统连接器本身就被包括在 Flink 中，不需要任何额外的依赖。当从文件系统中读取或向文件系统写入记录时，需要指定相应的记录格式。
 
-The file system connector allows for reading and writing from a local or 
distributed filesystem. A filesystem table can be defined as:
+文件系统连接器支持对本地文件系统或分布式文件系统的读取和写入。 可以通过如下方式定义文件系统表:
 
 ```sql
 CREATE TABLE MyUserTable (

Review comment:
       好的。

##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -149,15 +145,14 @@ a timeout that specifies the maximum duration for which a 
file can be open.
   </tbody>
 </table>
 
-**NOTE:** For bulk formats (parquet, orc, avro), the rolling policy in 
combination with the checkpoint interval(pending files

Review comment:
       好的。

##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -149,15 +145,14 @@ a timeout that specifies the maximum duration for which a 
file can be open.
   </tbody>
 </table>
 
-**NOTE:** For bulk formats (parquet, orc, avro), the rolling policy in 
combination with the checkpoint interval(pending files
-become finished on the next checkpoint) control the size and number of these 
parts.
+**注意:** 对于 bulk 格式 (parquet, orc, avro), 滚动策略和检查点间隔控制了分区文件的大小和个数 
(未完成的文件会在下个检查点完成）.
 
-**NOTE:** For row formats (csv, json), you can set the parameter 
`sink.rolling-policy.file-size` or `sink.rolling-policy.rollover-interval` in 
the connector properties and parameter `execution.checkpointing.interval` in 
flink-conf.yaml together
-if you don't want to wait a long period before observe the data exists in file 
system. For other formats (avro, orc), you can just set parameter 
`execution.checkpointing.interval` in flink-conf.yaml.
+**注意:** 对于行格式 (csv, json), 如果想使得分区文件更快地在文件系统中可见，可以设置连接器参数 
`sink.rolling-policy.file-size` 或 `sink.rolling-policy.rollover-interval` ，以及 
flink-conf.yaml 中的 `execution.checkpointing.interval` 。 
+对于其他格式 (avro, orc), 可以只设置 flink-conf.yaml 中的 
`execution.checkpointing.interval` 。
 
-### File Compaction
+### 文件压缩

Review comment:
       嗯，“文件压缩”容易造成歧义，确实“文件合并”更好。

##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -184,24 +179,25 @@ The file sink supports file compactions, which allows 
applications to have small
   </tbody>
 </table>
 
-If enabled, file compaction will merge multiple small files into larger files 
based on the target file size.
-When running file compaction in production, please be aware that:
-- Only files in a single checkpoint are compacted, that is, at least the same 
number of files as the number of checkpoints is generated.
-- The file before merging is invisible, so the visibility of the file may be: 
checkpoint interval + compaction time.
-- If the compaction takes too long, it will backpressure the job and extend 
the time period of checkpoint.
+启用该参数后，文件压缩功能会根据设定的目标文件大小，合并多个小文件到大文件。
+当在生产环境使用文件压缩功能时，需要注意：
+- 只有检查点内部的文件才会被压缩，也就是说，至少会生成跟检查点个数一样多的文件。
+- 合并前文件是可见的，所以文件的可见性是：检查点间隔 + 压缩时长。
+- 如果压缩花费的时间很长，会对作业产生背压，延长检查点所需时间。

Review comment:
       好的。

##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -372,9 +364,9 @@ public class AnalysisCommitPolicy implements 
PartitionCommitPolicy {
 
 ```
 
-## Sink Parallelism
+## Sink 并行度
 
-The parallelism of writing files into external file system (including Hive) 
can be configured by the corresponding table option, which is supported both in 
streaming mode and in batch mode. By default, the parallelism is configured to 
being the same as the parallelism of its last upstream chained operator. When 
the parallelism which is different from the parallelism of the upstream 
parallelism is configured, the operator of writing files and the operator 
compacting files (if used) will apply the parallelism.
+向外部文件系统（包括 hive) 写文件时的并行度，在流处理模式和批处理模式下，都可以通过对应的 table 选项指定。默认情况下，该并行度跟上一个上游的 
chained operator 的并行度一样。当配置了跟上一个上游的 chained operator 
不一样的并行度时，写文件的算子和压缩文件的算子（如果使用了的话）会使用指定的并行度。

Review comment:
       好的。

##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -173,35 +161,36 @@ The file sink supports file compactions, which allows 
applications to have small
         <td><h5>auto-compaction</h5></td>
         <td style="word-wrap: break-word;">false</td>
         <td>Boolean</td>
-        <td>Whether to enable automatic compaction in streaming sink or not. 
The data will be written to temporary files. After the checkpoint is completed, 
the temporary files generated by a checkpoint will be compacted. The temporary 
files are invisible before compaction.</td>
+        <td> 在流式 sink 
中是否开启自动合并功能。数据首先会被写入到临时文件，在检查点完成后，该检查点产生的临时文件会被合并。这些临时文件在合并前不可见.</td>
     </tr>
     <tr>
         <td><h5>compaction.file-size</h5></td>
         <td style="word-wrap: break-word;">(none)</td>
         <td>MemorySize</td>
-        <td>The compaction target file size, the default value is the rolling 
file size.</td>
+        <td> 合并目标文件大小，默认值是滚动文件大小.</td>
     </tr>
   </tbody>
 </table>
 
-If enabled, file compaction will merge multiple small files into larger files 
based on the target file size.
-When running file compaction in production, please be aware that:
-- Only files in a single checkpoint are compacted, that is, at least the same 
number of files as the number of checkpoints is generated.
-- The file before merging is invisible, so the visibility of the file may be: 
checkpoint interval + compaction time.
-- If the compaction takes too long, it will backpressure the job and extend 
the time period of checkpoint.
+启用该参数后，文件压缩功能会根据设定的目标文件大小，合并多个小文件到大文件。

Review comment:
       好的，我再检查下，统一修改为“合并”。

##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -279,24 +264,27 @@ Time extractors define extracting time from partition 
values.
         <td><h5>partition.time-extractor.kind</h5></td>
         <td style="word-wrap: break-word;">default</td>
         <td>String</td>
-        <td>Time extractor to extract time from partition values. Support 
default and custom. For default, can configure timestamp pattern. For custom, 
should configure extractor class.</td>
+        <td>从分区字段提取时间的时间提取器。支持默认值和定制。对于默认值，可以配置时间戳模式。对于定制，应指定提取器类.</td>
     </tr>
     <tr>
         <td><h5>partition.time-extractor.class</h5></td>
         <td style="word-wrap: break-word;">(none)</td>
         <td>String</td>
-        <td>The extractor class for implement PartitionTimeExtractor 
interface.</td>
+        <td>实现了接口 PartitionTimeExtractor 的提取器类.</td>
     </tr>
     <tr>
         <td><h5>partition.time-extractor.timestamp-pattern</h5></td>
-        <td style="word-wrap: break-word;">(none)</td>
+****        <td style="word-wrap: break-word;">(none)</td>

Review comment:
       oops, my fault.

##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -397,17 +389,21 @@ The parallelism of writing files into external file 
system (including Hive) can
         <td><h5>sink.parallelism</h5></td>
         <td style="word-wrap: break-word;">(none)</td>
         <td>Integer</td>
-        <td>Parallelism of writing files into external file system. The value 
should greater than zero otherwise exception will be thrown.</td>
+        <td> 向外部文件系统写文件时的并行度。必须大于 0，否则会抛出异常.</td>
     </tr>
     
   </tbody>
 </table>
 
-**NOTE:** Currently, Configuring sink parallelism is supported if and only if 
the changelog mode of upstream is **INSERT-ONLY**. Otherwise, exception will be 
thrown.
+**注意:** 当前，只有在上游的 changelog 模式是 **INSERT-ONLY** 时，才支持设置 sink 的并行度。否则的话，会抛出异常。
 
-## Full Example
+## 完整示例
 
+<<<<<<< HEAD

Review comment:
        i will revise the whole section under "完整示例"

##########
File path: docs/content.zh/docs/connectors/table/filesystem.md
##########
@@ -217,13 +206,15 @@ To define when to commit a partition, providing partition 
commit trigger:
         <td><h5>sink.partition-commit.trigger</h5></td>
         <td style="word-wrap: break-word;">process-time</td>
         <td>String</td>
-        <td>Trigger type for partition commit: 'process-time': based on the 
time of the machine, it neither requires partition time extraction nor 
watermark generation. Commit partition once the 'current system time' passes 
'partition creation system time' plus 'delay'. 'partition-time': based on the 
time that extracted from partition values, it requires watermark generation. 
Commit partition once the 'watermark' passes 'time extracted from partition 
values' plus 'delay'.</td>
+        <td>分区提交触发器类型。 
+         'process-time': 基于机器时间，既不需要分区时间提取器也不需要水印生成器，一旦 ”当前系统时间“ 超过了 
“分区创建系统时间” 和 'sink.partition-commit.delay' 之和，就提交分区；
+         'partition-time': 基于从分区字段提取的时间，需要水印生成器，一旦 “水印” 超过了 ”从分区字段提取的时间“ 和 
'sink.partition-commit.delay' 之和，就提交分区.</td>
     </tr>
     <tr>
         <td><h5>sink.partition-commit.delay</h5></td>
         <td style="word-wrap: break-word;">0 s</td>
         <td>Duration</td>
-        <td>The partition will not commit until the delay time. If it is a 
daily partition, should be '1 d', if it is a hourly partition, should be '1 
h'.</td>
+        <td>该延迟时间之前分区不会被提交。如果是按天的分区，应配置为 '1 d', 如果是按小时的分区，应配置为 '1 h'.</td>
     </tr>
     <tr>
         <td><h5>sink.partition-commit.watermark-time-zone</h5></td>

Review comment:
       ok, done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] michaelli916 commented on a change in pull request #13459: [FLINK-18199] [chinese-translation]: translate FileSystem SQL Connector page into Chinese

Reply via email to