sijie commented on a change in pull request #4786: Add *Understand Schema* 
Section
URL: https://github.com/apache/pulsar/pull/4786#discussion_r306603660
 
 

 ##########
 File path: site2/docs/schema-understand.md
 ##########
 @@ -0,0 +1,321 @@
+---
+id: schema-understand
+title: Understand schema
+sidebar_label: Understand schema
+---
+
+## `SchemaInfo`
+
+Pulsar schema is defined in a data structure called `SchemaInfo`. 
+
+The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be 
stored at the namespace or tenant level.
+
+A `SchemaInfo` consists of the following fields:
+
+| Field | Description |
+|---|---|
+| `name` | Schema name (a string). |
+| `type` | Schema type, which determines how to interpret the schema data. |
+| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and 
schema-type specific. |
+| `properties` | A map of string key/value pairs, which is 
application-specific. |
+
+**Example**
+
+This is the `SchemaInfo` of a string.
+
+```text
+{
+    “name”: “test-string-schema”,
+    “type”: “STRING”,
+    “schema”: “”,
+    “properties”: {}
+}
+```
+
+## Schema type
+
+Pulsar supports various schema types, which are mainly divided into two 
categories: 
+
+* Primitive type 
+
+* Complex type
+
+> #### Note
+> 
+> If you create a schema without specifying a type, producers and consumers 
can only handle raw bytes.
+
+### Primitive type
+
+Currently, Pulsar supports the following primitive types:
+
+| Primitive Type | Description |
+|---|---|
+| `BOOLEAN` | A binary value |
+| `INT8` | A 8-bit signed integer |
+| `INT16` | A 16-bit signed integer |
+| `INT32` | A 32-bit signed integer |
+| `INT64` | A 64-bit signed integer |
+| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number |
+| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number |
+| `BYTES` | A sequence of 8-bit unsigned bytes |
+| `STRING` | A Unicode character sequence |
+| `TIMESTAMP` (`DATE`, `TIME`) |  A logic type represents a specific instant 
in time with millisecond precision. It stores the number of milliseconds since 
`January 1, 1970, 00:00:00 GMT` as an `INT64` value | 
+
+For primitive types, Pulsar does not store any schema data in `SchemaInfo`. 
The `type` in `SchemaInfo` is used to determine how to serialize and 
deserialize the data. 
+
+Some of the primitive schema implementations can use `properties` to store 
implementation-specific tunable settings. For example, a `string` schema can 
use `properties` to store the encoding charset to serialize and deserialize 
strings.
+
+The conversions between **Pulsar schema types** and **language-specific 
primitive types** are as below.
+
+| Schema Type | Java Type| Python Type |
+|---|---|---|
+| BOOLEAN | boolean | bool |
+| INT8 | byte | |
+| INT16 | short | | 
+| INT32 | int | |
+| INT64 | long | |
+| FLOAT | float | float |
+| DOUBLE | double | float |
+| BYTES | byte[], ByteBuffer, ByteBuf | bytes |
+| STRING | string | str |
+| TIMESTAMP | java.sql.Timestamp | |
+| TIME | java.sql.Time | |
+| DATE | java.util.Date | |
+
+**Example**
+
+This example demonstrates how to use a string schema.
+
+1. Create a producer with a string schema and send messages.
+
+    ```text
+    Producer<String> producer = client.newProducer(Schema.STRING).create();
+    producer.newMessage().value("Hello Pulsar!").send();
+    ```
+
+2. Create a consumer with a string schema and receive messages.  
+
+    ```text
+    Consumer<String> consumer = client.newConsumer(Schema.STRING).create();
+    consumer.receive();
+    ```
+
+### Complex type
+
+Currently, Pulsar supports the following complex types:
+
+| Complex Type | Description |
+|---|---|
+| `keyvalue` | Represents a complex type of a key/value pair. |
+| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. |
+
+* **Complex type 1: `keyvalue`**
+
+    `keyvalue` schema helps applications define schemas for both key and 
value. 
+
+    For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of 
key schema and the `SchemaInfo` of value schema together.
+
+    Pulsar provides two methods to encode a key/value pair in messages: 
+
+    * **`INLINE`** mode: a key/value pair will be encoded together in the 
message payload.
+  
+    * **`SEPARATED`** mode: the key will be encoded in the message key and the 
value will be encoded in the message payload. 
+  
+    Users can choose the encoding type when constructing the key/value schema.
+
+    **Example**
 
 Review comment:
   Have you verified the final rendered result?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to