317brian commented on code in PR #16840:
URL: https://github.com/apache/druid/pull/16840#discussion_r1704433228


##########
docs/ingestion/data-formats.md:
##########
@@ -731,6 +731,148 @@ This query returns:
 |--------------------|-----------|---------------|---------------|
 | `development`      | `wiki-edit` | `1680795276351` | `wiki-edits`  |
 
+### Kinesis
+
+The `kinesis` input format lets you parse the Kinesis metadata fields in 
addition to the Kinesis payload value contents.
+It should only be used when ingesting from Kinesis.
+
+The `kinesis` input format wraps around the payload parsing input format and 
augments the data it outputs with the kinesis event timestamp, and partition 
key.
+
+If there are conflicts between column names in the payload and those created 
from the metadata, the payload takes precedence.
+This ensures that upgrading a Kinesis ingestion to use the Kinesis input 
format (by taking its existing input format and setting it as the 
`valueFormat`) can be done without losing any of the payload data.
+
+Configure the Kinesis `inputFormat` as follows:
+
+| Field | Type | Description                                                   
                                                                                
    | Required | Default             |
+|-------|------|---------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------------|
+| `type` | String | Set value to `kinesis`. | yes ||
+| `valueFormat` | [InputFormat](#input-format) | The [input 
format](#input-format) to parse the Kinesis value payload. | yes ||
+| `partitionKeyColumnName` | String | The name of the column for the Kinesis 
partition key. This field is useful when ingesting data from multiple 
partitions into the same datasource. | no | `kinesis.partitionKey` |
+| `timestampColumnName` | String | The name of the column for the Kinesis 
timestamp. | no | `kinesis.timestamp` |
+
+#### Example
+
+Using `{ "type": "json" }` as the input format would only parse the payload 
value.
+To parse the Kinesis metadata in addition to the payload, use the `kinesis` 
input format.
+
+For example, consider the following structure for a Kinesis message that 
represents an edit in a development environment:

Review Comment:
   The official Kinesis docs prefer record over message even though they're the 
same thing.
   
   ```suggestion
   For example, consider the following structure for a Kinesis record that 
represents an edit in a development environment:
   ```



##########
docs/ingestion/data-formats.md:
##########
@@ -731,6 +731,148 @@ This query returns:
 |--------------------|-----------|---------------|---------------|
 | `development`      | `wiki-edit` | `1680795276351` | `wiki-edits`  |
 
+### Kinesis
+
+The `kinesis` input format lets you parse the Kinesis metadata fields in 
addition to the Kinesis payload value contents.
+It should only be used when ingesting from Kinesis.
+
+The `kinesis` input format wraps around the payload parsing input format and 
augments the data it outputs with the kinesis event timestamp, and partition 
key.

Review Comment:
   ```suggestion
   The `kinesis` input format wraps around the payload parsing input format and 
augments the data it outputs with the Kinesis event timestamp and partition 
key, the `ApproximateArrivalTimestamp ` and `PartitionKey` fields in the 
Kinesis record.
   ```



##########
docs/ingestion/data-formats.md:
##########
@@ -731,6 +731,148 @@ This query returns:
 |--------------------|-----------|---------------|---------------|
 | `development`      | `wiki-edit` | `1680795276351` | `wiki-edits`  |
 
+### Kinesis
+
+The `kinesis` input format lets you parse the Kinesis metadata fields in 
addition to the Kinesis payload value contents.
+It should only be used when ingesting from Kinesis.
+
+The `kinesis` input format wraps around the payload parsing input format and 
augments the data it outputs with the Kinesis event timestamp, and partition 
key.

Review Comment:
   ```suggestion
   The `kinesis` input format wraps around the payload parsing input format and 
augments the data it outputs with the Kinesis event timestamp and partition 
key, the `ApproximateArrivalTimestamp ` and `PartitionKey` fields in the 
Kinesis record.
   ```



##########
docs/ingestion/data-formats.md:
##########
@@ -731,6 +731,148 @@ This query returns:
 |--------------------|-----------|---------------|---------------|
 | `development`      | `wiki-edit` | `1680795276351` | `wiki-edits`  |
 
+### Kinesis
+
+The `kinesis` input format lets you parse the Kinesis metadata fields in 
addition to the Kinesis payload value contents.
+It should only be used when ingesting from Kinesis.
+
+The `kinesis` input format wraps around the payload parsing input format and 
augments the data it outputs with the Kinesis event timestamp, and partition 
key.
+
+If there are conflicts between column names in the payload and those created 
from the metadata, the payload takes precedence.
+This ensures that upgrading a Kinesis ingestion to use the Kinesis input 
format (by taking its existing input format and setting it as the 
`valueFormat`) can be done without losing any of the payload data.
+
+Configure the Kinesis `inputFormat` as follows:
+
+| Field | Type | Description                                                   
                                                                                
    | Required | Default             |
+|-------|------|---------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------------|
+| `type` | String | Set value to `kinesis`. | yes ||
+| `valueFormat` | [InputFormat](#input-format) | The [input 
format](#input-format) to parse the Kinesis value payload. | yes ||
+| `partitionKeyColumnName` | String | The name of the column for the Kinesis 
partition key. This field is useful when ingesting data from multiple 
partitions into the same datasource. | no | `kinesis.partitionKey` |
+| `timestampColumnName` | String | The name of the column for the Kinesis 
timestamp. | no | `kinesis.timestamp` |
+
+#### Example
+
+Using `{ "type": "json" }` as the input format would only parse the payload 
value.
+To parse the Kinesis metadata in addition to the payload, use the `kinesis` 
input format.
+
+For example, consider the following structure for a Kinesis message that 
represents an edit in a development environment:
+
+- **Kinesis timestamp**: `1680795276351`
+- **Kinesis partition key**: `partition-1`
+- **Kinesis payload value**: 
`{"channel":"#sv.wikipedia","timestamp":"2016-06-27T00:00:11.080Z","page":"Salo 
Toraut","delta":31,"namespace":"Main"}`
+
+You would configure it as follows:
+
+```json
+{
+  "ioConfig": {
+    "inputFormat": {
+      "type": "kinesis",
+      "valueFormat": {
+        "type": "json"
+      },
+      "timestampColumnName": "kinesis.timestamp",
+      "partitionKeyColumnName": "kinesis.partitionKey"
+    }
+  }
+}
+```
+
+You would parse the example message as follows:
+
+```json
+{
+  "channel": "#sv.wikipedia",
+  "timestamp": "2016-06-27T00:00:11.080Z",
+  "page": "Salo Toraut",
+  "delta": 31,
+  "namespace": "Main",
+  "kinesis.timestamp": 1680795276351,
+  "kinesis.partitionKey": "partition-1"
+}
+```
+
+If you want to use `kinesis.timestamp` as Druid's primary timestamp 
(`__time`), specify it as the value for `column` in the `timestampSpec`:
+
+```json
+"timestampSpec": {
+  "column": "kinesis.timestamp",
+  "format": "millis"
+}
+```
+
+Finally, add these Kinesis metadata columns to the `dimensionsSpec` or  set 
your `dimensionsSpec` to auto-detect columns.

Review Comment:
   ```suggestion
   Finally, add these Kinesis metadata columns to the `dimensionsSpec` or  set 
your `dimensionsSpec` to automatically detect columns.
   ```



##########
docs/ingestion/data-formats.md:
##########
@@ -731,6 +731,148 @@ This query returns:
 |--------------------|-----------|---------------|---------------|
 | `development`      | `wiki-edit` | `1680795276351` | `wiki-edits`  |
 
+### Kinesis
+
+The `kinesis` input format lets you parse the Kinesis metadata fields in 
addition to the Kinesis payload value contents.
+It should only be used when ingesting from Kinesis.
+
+The `kinesis` input format wraps around the payload parsing input format and 
augments the data it outputs with the Kinesis event timestamp, and partition 
key.
+
+If there are conflicts between column names in the payload and those created 
from the metadata, the payload takes precedence.
+This ensures that upgrading a Kinesis ingestion to use the Kinesis input 
format (by taking its existing input format and setting it as the 
`valueFormat`) can be done without losing any of the payload data.
+
+Configure the Kinesis `inputFormat` as follows:
+
+| Field | Type | Description                                                   
                                                                                
    | Required | Default             |
+|-------|------|---------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------------|
+| `type` | String | Set value to `kinesis`. | yes ||
+| `valueFormat` | [InputFormat](#input-format) | The [input 
format](#input-format) to parse the Kinesis value payload. | yes ||
+| `partitionKeyColumnName` | String | The name of the column for the Kinesis 
partition key. This field is useful when ingesting data from multiple 
partitions into the same datasource. | no | `kinesis.partitionKey` |
+| `timestampColumnName` | String | The name of the column for the Kinesis 
timestamp. | no | `kinesis.timestamp` |
+
+#### Example
+
+Using `{ "type": "json" }` as the input format would only parse the payload 
value.
+To parse the Kinesis metadata in addition to the payload, use the `kinesis` 
input format.
+
+For example, consider the following structure for a Kinesis message that 
represents an edit in a development environment:
+
+- **Kinesis timestamp**: `1680795276351`
+- **Kinesis partition key**: `partition-1`
+- **Kinesis payload value**: 
`{"channel":"#sv.wikipedia","timestamp":"2016-06-27T00:00:11.080Z","page":"Salo 
Toraut","delta":31,"namespace":"Main"}`
+
+You would configure it as follows:
+
+```json
+{
+  "ioConfig": {
+    "inputFormat": {
+      "type": "kinesis",
+      "valueFormat": {
+        "type": "json"
+      },
+      "timestampColumnName": "kinesis.timestamp",
+      "partitionKeyColumnName": "kinesis.partitionKey"
+    }
+  }
+}
+```
+
+You would parse the example message as follows:

Review Comment:
   ```suggestion
   You would parse the example record as follows:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to