sijie commented on a change in pull request #4786: Add *Understand Schema*
Section
URL: https://github.com/apache/pulsar/pull/4786#discussion_r306603660
##########
File path: site2/docs/schema-understand.md
##########
@@ -0,0 +1,321 @@
+---
+id: schema-understand
+title: Understand schema
+sidebar_label: Understand schema
+---
+
+## `SchemaInfo`
+
+Pulsar schema is defined in a data structure called `SchemaInfo`.
+
+The `SchemaInfo` is stored and enforced on a per-topic basis and cannot be
stored at the namespace or tenant level.
+
+A `SchemaInfo` consists of the following fields:
+
+| Field | Description |
+|---|---|
+| `name` | Schema name (a string). |
+| `type` | Schema type, which determines how to interpret the schema data. |
+| `schema` | Schema data, which is a sequence of 8-bit unsigned bytes and
schema-type specific. |
+| `properties` | A map of string key/value pairs, which is
application-specific. |
+
+**Example**
+
+This is the `SchemaInfo` of a string.
+
+```text
+{
+ “name”: “test-string-schema”,
+ “type”: “STRING”,
+ “schema”: “”,
+ “properties”: {}
+}
+```
+
+## Schema type
+
+Pulsar supports various schema types, which are mainly divided into two
categories:
+
+* Primitive type
+
+* Complex type
+
+> #### Note
+>
+> If you create a schema without specifying a type, producers and consumers
can only handle raw bytes.
+
+### Primitive type
+
+Currently, Pulsar supports the following primitive types:
+
+| Primitive Type | Description |
+|---|---|
+| `BOOLEAN` | A binary value |
+| `INT8` | A 8-bit signed integer |
+| `INT16` | A 16-bit signed integer |
+| `INT32` | A 32-bit signed integer |
+| `INT64` | A 64-bit signed integer |
+| `FLOAT` | A single precision (32-bit) IEEE 754 floating-point number |
+| `DOUBLE` | A double-precision (64-bit) IEEE 754 floating-point number |
+| `BYTES` | A sequence of 8-bit unsigned bytes |
+| `STRING` | A Unicode character sequence |
+| `TIMESTAMP` (`DATE`, `TIME`) | A logic type represents a specific instant
in time with millisecond precision. It stores the number of milliseconds since
`January 1, 1970, 00:00:00 GMT` as an `INT64` value |
+
+For primitive types, Pulsar does not store any schema data in `SchemaInfo`.
The `type` in `SchemaInfo` is used to determine how to serialize and
deserialize the data.
+
+Some of the primitive schema implementations can use `properties` to store
implementation-specific tunable settings. For example, a `string` schema can
use `properties` to store the encoding charset to serialize and deserialize
strings.
+
+The conversions between **Pulsar schema types** and **language-specific
primitive types** are as below.
+
+| Schema Type | Java Type| Python Type |
+|---|---|---|
+| BOOLEAN | boolean | bool |
+| INT8 | byte | |
+| INT16 | short | |
+| INT32 | int | |
+| INT64 | long | |
+| FLOAT | float | float |
+| DOUBLE | double | float |
+| BYTES | byte[], ByteBuffer, ByteBuf | bytes |
+| STRING | string | str |
+| TIMESTAMP | java.sql.Timestamp | |
+| TIME | java.sql.Time | |
+| DATE | java.util.Date | |
+
+**Example**
+
+This example demonstrates how to use a string schema.
+
+1. Create a producer with a string schema and send messages.
+
+ ```text
+ Producer<String> producer = client.newProducer(Schema.STRING).create();
+ producer.newMessage().value("Hello Pulsar!").send();
+ ```
+
+2. Create a consumer with a string schema and receive messages.
+
+ ```text
+ Consumer<String> consumer = client.newConsumer(Schema.STRING).create();
+ consumer.receive();
+ ```
+
+### Complex type
+
+Currently, Pulsar supports the following complex types:
+
+| Complex Type | Description |
+|---|---|
+| `keyvalue` | Represents a complex type of a key/value pair. |
+| `struct` | Supports **AVRO**, **JSON**, and **Protobuf**. |
+
+* **Complex type 1: `keyvalue`**
+
+ `keyvalue` schema helps applications define schemas for both key and
value.
+
+ For `SchemaInfo` of `keyvalue` schema, Pulsar stores the `SchemaInfo` of
key schema and the `SchemaInfo` of value schema together.
+
+ Pulsar provides two methods to encode a key/value pair in messages:
+
+ * **`INLINE`** mode: a key/value pair will be encoded together in the
message payload.
+
+ * **`SEPARATED`** mode: the key will be encoded in the message key and the
value will be encoded in the message payload.
+
+ Users can choose the encoding type when constructing the key/value schema.
+
+ **Example**
Review comment:
Have you verified the final rendered result?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services