This is an automated email from the ASF dual-hosted git repository.

dockerzhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/inlong-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 713d2738aa [INLONG-682][Document] Sort out the format concept and 
related attributes (#683)
713d2738aa is described below

commit 713d2738aa364be79b0d1a9c6a844034ed98a308
Author: feat <[email protected]>
AuthorDate: Thu Feb 9 22:02:17 2023 +0800

    [INLONG-682][Document] Sort out the format concept and related attributes 
(#683)
    
    Co-authored-by: Charles Zhang <[email protected]>
---
 docs/design_and_concept/img/format_and_flink.png   | Bin 0 -> 86161 bytes
 .../img/the_format_in_inlong.png                   | Bin 0 -> 42853 bytes
 docs/design_and_concept/the_format_in_inlong.md    |  95 ++++++++++++++++++++
 .../design_and_concept/img/format_and_flink.png    | Bin 0 -> 86161 bytes
 .../img/the_format_in_inlong.png                   | Bin 0 -> 42853 bytes
 .../design_and_concept/the_format_in_inlong.md     | 100 +++++++++++++++++++++
 6 files changed, 195 insertions(+)

diff --git a/docs/design_and_concept/img/format_and_flink.png 
b/docs/design_and_concept/img/format_and_flink.png
new file mode 100644
index 0000000000..7d49580263
Binary files /dev/null and b/docs/design_and_concept/img/format_and_flink.png 
differ
diff --git a/docs/design_and_concept/img/the_format_in_inlong.png 
b/docs/design_and_concept/img/the_format_in_inlong.png
new file mode 100644
index 0000000000..dac80d2466
Binary files /dev/null and 
b/docs/design_and_concept/img/the_format_in_inlong.png differ
diff --git a/docs/design_and_concept/the_format_in_inlong.md 
b/docs/design_and_concept/the_format_in_inlong.md
new file mode 100644
index 0000000000..94d107ee62
--- /dev/null
+++ b/docs/design_and_concept/the_format_in_inlong.md
@@ -0,0 +1,95 @@
+---
+title: Format 
+sidebar_position: 7
+---
+
+## What is format ?
+
+![](img/format_and_flink.png)
+
+As shown in the figure, in Flink SQL, when reading and writing data, it adopts 
the form of Row. Inside it is an Object array `Object[]`, and each element in 
the array represents a field of the Flink table. The information about field 
type , name and precision is marked by `Schema` .
+
+Format provides two interfaces : SerializationSchema and DeserializationSchema 
:
+- When Flink writes data to MQ , it needs to serialize `Flink Row` into 
`key-value` / `csv` / `Json` format . Then call the method of 
`SerializationSchema#serialize` . Data will be serialized into Byte[] , which 
can be written to MQ .
+- When Flink reads data from MQ , it works vice versa . It reads data from MQ 
with format Byte[] . Then deserializes them into Format and finally converts 
them into Flink row .
+
+> See
+> details: 
[`inlong-sort/sort-formats`](https://github.com/apache/inlong/tree/release-1.5.0/inlong-sort/sort-formats)
+
+## The format in InLong
+
+![](img/the_format_in_inlong.png)
+
+InLong serves as a one-stop data integration platform , with MQ (the Cache 
part in the picture) as the transmission channel , which decouples DataProxy 
and Sort and provides better scalability . When DataProxy is reporting data , 
it needs to serialize the data into corresponding format ( 
`SerializationSchema#serialize` ) . When Sort receives data, it will 
deserialize the MQ's data ( `DeserializationSchema#deserialize` ) into `Flink 
Row` , and then write to the corresponding storage using [...]
+
+## What are the formats?
+
+Currently , InLong-sort provides CSV / KeyValue / JSON , and the corresponding 
InLongMsg packaging format .
+
+### CSV
+
+```xml
+<dependency>
+<groupId>org.apache.inlong</groupId>
+<artifactId>sort-format-csv</artifactId>
+<version>${inlong.version}</version>
+</dependency>
+```
+
+`org.apache.inlong.sort.formats.kv.KvFormatFactory`
+
+| Option                    | Type    | Required | Default value               
             | Advanced | Remark                                                
                                                                                
                                                                                
                                                                                
                                             |
+|---------------------------|---------|----------|------------------------------------------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `format.delimiter`        | char    | Y        | `,`                         
             | N        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.escape-character` | char    | N        | disabled                    
             | Y        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.quote-character`  | char    | N        | disabled                    
             | Y        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.null-literal`     | String  | N        | disabled                    
             | Y        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.charset`          | String  | Y        | "UTF-8"                     
             | N        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.ignore-errors`    | Boolean | Y        | true                        
             | N        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.derive_schema`    | Boolean | N        | Required if no format 
schema is defined . | Y        | Derives the format schema from the table's 
schema . This allows for defining schema information only once . <br/> The 
names , types , and fields' order of the format are determined by the table's 
schema . <br/> Time attributes are ignored if their origin is not a field . 
<br/> A "from" definition is interpreted as a field renaming in the format . |
+
+### Key-Value
+
+```xml
+<dependency>
+<groupId>org.apache.inlong</groupId>
+<artifactId>sort-format-kv</artifactId>
+<version>${inlong.version}</version>
+</dependency>
+```
+
+`org.apache.inlong.sort.formats.csv.CsvFormatFactory`
+
+| Option                    | Type    | Required | Default value               
             | Advanced | Remark                                                
                                                                                
                                                                                
                                                                                
                                             |
+|---------------------------|---------|----------|------------------------------------------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `format.entry-delimiter`  | char    | N        | '&'                         
             | N        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.kv-delimiter`     | char    | N        | '='                         
             | N        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.escape-character` | char    | N        | disabled                    
             | Y        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.quote-character`  | char    | N        | disabled                    
             | Y        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.null-literal`     | char    | N        | disabled                    
             | Y        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.charset`          | String  | Y        | "UTF-8"                     
             | N        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.ignore-errors`    | Boolean | Y        | true                        
             | N        |                                                       
                                                                                
                                                                                
                                                                                
                                             |
+| `format.derive_schema`    | Boolean | N        | Required if no format 
schema is defined . | Y        | Derives the format schema from the table's 
schema . This allows for defining schema information only once . <br/> The 
names , types , and fields' order of the format are determined by the table's 
schema . <br/> Time attributes are ignored if their origin is not a field . 
<br/> A "from" definition is interpreted as a field renaming in the format . |
+
+### JSON
+
+```xml
+<dependency>
+<groupId>org.apache.flink</groupId>
+<artifactId>flink-json</artifactId>
+<version>${flink.version}</version>
+</dependency>
+```
+
+`org.apache.flink.formats.json.JsonFormatFactory`
+
+`org.apache.flink.formats.json.JsonOptions`
+
+| Option                           | Type    | Required | Default value | 
Advanced | Remark                                                               
                                                                                
                                                                                
                                                                                
             |
+|----------------------------------|---------|----------|---------------|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `ignore-parse-errors`            | Boolean | N        | false         | N    
    | Optional flag to skip fields and rows with parse errors instead of 
failing ; <br/> fields are set to null in case of errors , false by default .   
                                                                                
                                                                                
                   |
+| `map-null-key.mode`              | String  | N        | "FAIL"        | Y    
    | Optional flag to control the handling mode when serializing null key for 
map data ." <br/> Option DROP will drop null key entries for map data ." <br/> 
Option LITERAL will use 'map-null-key.literal' as key literal .                 
                                                                                
                 |
+| `map-null-key.literal`           | String  | N        | "null"        | Y    
    | Optional flag to specify string literal for null keys when 
'map-null-key.mode' is LITERAL .                                                
                                                                                
                                                                                
                        |
+| `encode.decimal-as-plain-number` | Boolean | N        | false         | Y    
    | Optional flag to specify whether to encode all decimals as plain numbers 
instead of possible scientific notations , false by default .                   
                                                                                
                                                                                
           |
+| `timestamp-format.standard`      | String  | N        | "SQL"         | Y    
    | Optional flag to specify timestamp format , SQL by default ."<br/> Option 
ISO-8601 will parse input timestamp in "yyyy-MM-ddTHH:mm:ss.s{precision}" 
format and output timestamp in the same format ."<br/> Option SQL will parse 
input timestamp in "yyyy-MM-dd HH:mm:ss.s{precision}" format and output 
timestamp in the same format . |
+| `encode.decimal-as-plain-number` | Boolean | N        | false         | Y    
    | Optional flag to specify whether to encode all decimals as plain numbers 
instead of possible scientific notations , false by default .                   
                                                                                
                                                                                
           |
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/img/format_and_flink.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/img/format_and_flink.png
new file mode 100644
index 0000000000..7d49580263
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/img/format_and_flink.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/img/the_format_in_inlong.png
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/img/the_format_in_inlong.png
new file mode 100644
index 0000000000..dac80d2466
Binary files /dev/null and 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/img/the_format_in_inlong.png
 differ
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/the_format_in_inlong.md
 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/the_format_in_inlong.md
new file mode 100644
index 0000000000..00c8b2c14a
--- /dev/null
+++ 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/the_format_in_inlong.md
@@ -0,0 +1,100 @@
+---
+title: Format
+sidebar_position: 7
+---
+
+## 什么是 Format?
+
+![](img/format_and_flink.png)
+
+如上图所示, Flink SQL 在读写数据时,均采用 Row 的形式,其内部为 Object 数组 `Object[]`,数组中每个元素代表了一个 
Flink 表的字段。
+字段的类型、名称、精度等信息,通过 `Schema` 来标示。
+
+Flink 的 Format 提供了两种接口:SerializationSchema 和 DeserializationSchema。
+
+- 当 Flink 往 MQ 写数据时,需要把 `Flink Row` 序列化为 `key-value` / `csv` / `Json` 等 Format,
+  这时调用了 `SerializationSchema#serialize` 方法,数据会序列化成 `Byte[]`,写入到 `MQ`。
+- 当 Flink 读取 MQ 的数据时,该过程则相反:从 MQ 读取数据,数据格式为 `byte[]`,反序列化为`Format`,再转换为 `Flink 
row`。
+
+> 详情请查看代码 
[`inlong-sort/sort-formats`](https://github.com/apache/inlong/tree/release-1.5.0/inlong-sort/sort-formats)
+
+## InLong 中的 Format
+
+![](img/the_format_in_inlong.png)
+
+InLong 作为一站式的数据集成平台,将 MQ(图中 Cache 部分)作为传输通道,同时实现 DataProxy 与 Sort 的解耦,扩展性会更强:
+
+- DataProxy 上报数据时,需要将数据序列化成对应的格式(`SerializationSchema#serialize`)。
+- Sort 接收到数据,将 MQ 的数据反序列化(`DeserializationSchema#deserialize`)成 `Flink Row` 
,通过 Flink SQL 写入到对应的存储。
+
+## 有哪些 Format?
+
+目前,InLong-Sort 提供了 CSV/KeyValue/JSON,以及通过 InLongMsg 封装的格式。
+
+### CSV
+
+```xml
+<dependency>
+<groupId>org.apache.inlong</groupId>
+<artifactId>sort-format-csv</artifactId>
+<version>${inlong.version}</version>
+</dependency>
+```
+
+`org.apache.inlong.sort.formats.kv.KvFormatFactory`
+
+| Option                    | Type    | Required | Default value            | 
高级属性 | Remark                                                                   
                                                              |
+|---------------------------|---------|----------|--------------------------|------|----------------------------------------------------------------------------------------------------------------------------------------|
+| `format.delimiter`        | char    | Y        | `,`                      | 
N    |                                                                          
                                                              |
+| `format.escape-character` | char    | N        | disabled                 | 
Y    |                                                                          
                                                              |
+| `format.quote-character`  | char    | N        | disabled                 | 
Y    |                                                                          
                                                              |
+| `format.null-literal`     | String  | N        | disabled                 | 
Y    |                                                                          
                                                              |
+| `format.charset`          | String  | Y        | "UTF-8"                  | 
N    |                                                                          
                                                              |
+| `format.ignore-errors`    | Boolean | Y        | true                     | 
N    |                                                                          
                                                              |
+| `format.derive_schema`    | Boolean | N        | 如果未定义 Format Schema,则为必需。 | 
Y    | 从表的 Schema 中派生 Format Schema 。 这允许只定义一次schema 信息。 <br/> format 
的名称、类型和字段顺序由表的 schema 决定。 <br/>如果时间属性不是字段,则忽略它们。 <br/> “from” 定义被解释为 format 
中的字段重命名。 |
+
+### Key-Value
+
+```xml
+<dependency>
+<groupId>org.apache.inlong</groupId>
+<artifactId>sort-format-kv</artifactId>
+<version>${inlong.version}</version>
+</dependency>
+```
+
+`org.apache.inlong.sort.formats.csv.CsvFormatFactory`
+
+| Option                    | Type    | Required | Default value               
             | 高级属性 | Remark                                                    
                                                                             |
+|---------------------------|---------|----------|------------------------------------------|------|----------------------------------------------------------------------------------------------------------------------------------------|
+| `format.entry-delimiter`  | char    | N        | '&'                         
             | N    |                                                           
                                                                             |
+| `format.kv-delimiter`     | char    | N        | '='                         
             | N    |                                                           
                                                                             |
+| `format.escape-character` | char    | N        | disabled                    
             | Y    |                                                           
                                                                             |
+| `format.quote-character`  | char    | N        | disabled                    
             | Y    |                                                           
                                                                             |
+| `format.null-literal`     | char    | N        | disabled                    
             | Y    |                                                           
                                                                             |
+| `format.charset`          | String  | Y        | "UTF-8"                     
             | N    |                                                           
                                                                             |
+| `format.ignore-errors`    | Boolean | Y        | true                        
             | N    |                                                           
                                                                             |
+| `format.derive_schema`    | Boolean | N        | Required if no format 
schema is defined. | Y    | 从表的 Schema 中派生 Format Schema 。 这允许只定义一次schema信息。 
<br/> format 的名称、类型和字段顺序由表的 schema 决定。 <br/>如果时间属性不是字段,则忽略它们。 <br/> “from” 
定义被解释为 format 中的字段重命名。 |
+
+### JSON
+
+```xml
+<dependency>
+<groupId>org.apache.flink</groupId>
+<artifactId>flink-json</artifactId>
+<version>${flink.version}</version>
+</dependency>
+```
+
+`org.apache.flink.formats.json.JsonFormatFactory`
+
+`org.apache.flink.formats.json.JsonOptions`
+
+| Option                           | Type    | Required | Default value | 高级属性 
| Remark                                                                        
                                                                                
            |
+|----------------------------------|---------|----------|---------------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `ignore-parse-errors`            | Boolean | N        | false         | N    
| 可选标志以跳过具有解析错误而不是失败的字段和行; <br/>如果出现错误,字段设置为 null,默认情况下为 false。                 
                                                                                
            |
+| `map-null-key.mode`              | String  | N        | "FAIL"        | Y    
| 可选标志,用于在序列化map数据的空键时控制处理模式。<br/><br/>选项 DROP 将删除map数据的空键条目。<br/>选项 LITERAL 
将使用 'map-null-key.literal' 作为 key 关键字。                                          
                 |
+| `map-null-key.literal`           | String  | N        | "null"        | Y    
| 当“map-null-key.mode”为 LITERAL 时,用于为空键指定字符串文字的可选标志。                            
                                                                                
            |
+| `encode.decimal-as-plain-number` | Boolean | N        | false         | Y    
| 可选标志,用于指定是否将所有小数编码为普通数字而不是科学记数法,默认情况下为 false。                                 
                                                                                
            |
+| `timestamp-format.standard`      | String  | N        | "SQL"         | Y    
| 用于指定时间戳格式的可选标志,默认为 SQL。<br/>选项 ISO-8601 将以“yyyy-MM-ddTHH:mm:ss.s{precision}” 
格式解析输入时间戳,并以相同格式输出时间戳。 <br/>选项 SQL 将以“yyyy-MM-dd 
HH:mm:ss.s{precision}”格式解析输入时间戳,并以相同格式输出时间戳。 |
+| `encode.decimal-as-plain-number` | Boolean | N        | false         | Y    
| 可选标志,用于指定是否将所有小数编码为普通数字而不是可能的科学记数法,默认情况下为 `false`。                            
          |

Reply via email to