317brian commented on code in PR #16840:
URL: https://github.com/apache/druid/pull/16840#discussion_r1704433228
##########
docs/ingestion/data-formats.md:
##########
@@ -731,6 +731,148 @@ This query returns:
|--------------------|-----------|---------------|---------------|
| `development` | `wiki-edit` | `1680795276351` | `wiki-edits` |
+### Kinesis
+
+The `kinesis` input format lets you parse the Kinesis metadata fields in
addition to the Kinesis payload value contents.
+It should only be used when ingesting from Kinesis.
+
+The `kinesis` input format wraps around the payload parsing input format and
augments the data it outputs with the kinesis event timestamp, and partition
key.
+
+If there are conflicts between column names in the payload and those created
from the metadata, the payload takes precedence.
+This ensures that upgrading a Kinesis ingestion to use the Kinesis input
format (by taking its existing input format and setting it as the
`valueFormat`) can be done without losing any of the payload data.
+
+Configure the Kinesis `inputFormat` as follows:
+
+| Field | Type | Description
| Required | Default |
+|-------|------|---------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------------|
+| `type` | String | Set value to `kinesis`. | yes ||
+| `valueFormat` | [InputFormat](#input-format) | The [input
format](#input-format) to parse the Kinesis value payload. | yes ||
+| `partitionKeyColumnName` | String | The name of the column for the Kinesis
partition key. This field is useful when ingesting data from multiple
partitions into the same datasource. | no | `kinesis.partitionKey` |
+| `timestampColumnName` | String | The name of the column for the Kinesis
timestamp. | no | `kinesis.timestamp` |
+
+#### Example
+
+Using `{ "type": "json" }` as the input format would only parse the payload
value.
+To parse the Kinesis metadata in addition to the payload, use the `kinesis`
input format.
+
+For example, consider the following structure for a Kinesis message that
represents an edit in a development environment:
Review Comment:
The official Kinesis docs prefer record over message even though they're the
same thing.
```suggestion
For example, consider the following structure for a Kinesis record that
represents an edit in a development environment:
```
##########
docs/ingestion/data-formats.md:
##########
@@ -731,6 +731,148 @@ This query returns:
|--------------------|-----------|---------------|---------------|
| `development` | `wiki-edit` | `1680795276351` | `wiki-edits` |
+### Kinesis
+
+The `kinesis` input format lets you parse the Kinesis metadata fields in
addition to the Kinesis payload value contents.
+It should only be used when ingesting from Kinesis.
+
+The `kinesis` input format wraps around the payload parsing input format and
augments the data it outputs with the kinesis event timestamp, and partition
key.
Review Comment:
```suggestion
The `kinesis` input format wraps around the payload parsing input format and
augments the data it outputs with the Kinesis event timestamp and partition
key, the `ApproximateArrivalTimestamp ` and `PartitionKey` fields in the
Kinesis record.
```
##########
docs/ingestion/data-formats.md:
##########
@@ -731,6 +731,148 @@ This query returns:
|--------------------|-----------|---------------|---------------|
| `development` | `wiki-edit` | `1680795276351` | `wiki-edits` |
+### Kinesis
+
+The `kinesis` input format lets you parse the Kinesis metadata fields in
addition to the Kinesis payload value contents.
+It should only be used when ingesting from Kinesis.
+
+The `kinesis` input format wraps around the payload parsing input format and
augments the data it outputs with the Kinesis event timestamp, and partition
key.
Review Comment:
```suggestion
The `kinesis` input format wraps around the payload parsing input format and
augments the data it outputs with the Kinesis event timestamp and partition
key, the `ApproximateArrivalTimestamp ` and `PartitionKey` fields in the
Kinesis record.
```
##########
docs/ingestion/data-formats.md:
##########
@@ -731,6 +731,148 @@ This query returns:
|--------------------|-----------|---------------|---------------|
| `development` | `wiki-edit` | `1680795276351` | `wiki-edits` |
+### Kinesis
+
+The `kinesis` input format lets you parse the Kinesis metadata fields in
addition to the Kinesis payload value contents.
+It should only be used when ingesting from Kinesis.
+
+The `kinesis` input format wraps around the payload parsing input format and
augments the data it outputs with the Kinesis event timestamp, and partition
key.
+
+If there are conflicts between column names in the payload and those created
from the metadata, the payload takes precedence.
+This ensures that upgrading a Kinesis ingestion to use the Kinesis input
format (by taking its existing input format and setting it as the
`valueFormat`) can be done without losing any of the payload data.
+
+Configure the Kinesis `inputFormat` as follows:
+
+| Field | Type | Description
| Required | Default |
+|-------|------|---------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------------|
+| `type` | String | Set value to `kinesis`. | yes ||
+| `valueFormat` | [InputFormat](#input-format) | The [input
format](#input-format) to parse the Kinesis value payload. | yes ||
+| `partitionKeyColumnName` | String | The name of the column for the Kinesis
partition key. This field is useful when ingesting data from multiple
partitions into the same datasource. | no | `kinesis.partitionKey` |
+| `timestampColumnName` | String | The name of the column for the Kinesis
timestamp. | no | `kinesis.timestamp` |
+
+#### Example
+
+Using `{ "type": "json" }` as the input format would only parse the payload
value.
+To parse the Kinesis metadata in addition to the payload, use the `kinesis`
input format.
+
+For example, consider the following structure for a Kinesis message that
represents an edit in a development environment:
+
+- **Kinesis timestamp**: `1680795276351`
+- **Kinesis partition key**: `partition-1`
+- **Kinesis payload value**:
`{"channel":"#sv.wikipedia","timestamp":"2016-06-27T00:00:11.080Z","page":"Salo
Toraut","delta":31,"namespace":"Main"}`
+
+You would configure it as follows:
+
+```json
+{
+ "ioConfig": {
+ "inputFormat": {
+ "type": "kinesis",
+ "valueFormat": {
+ "type": "json"
+ },
+ "timestampColumnName": "kinesis.timestamp",
+ "partitionKeyColumnName": "kinesis.partitionKey"
+ }
+ }
+}
+```
+
+You would parse the example message as follows:
+
+```json
+{
+ "channel": "#sv.wikipedia",
+ "timestamp": "2016-06-27T00:00:11.080Z",
+ "page": "Salo Toraut",
+ "delta": 31,
+ "namespace": "Main",
+ "kinesis.timestamp": 1680795276351,
+ "kinesis.partitionKey": "partition-1"
+}
+```
+
+If you want to use `kinesis.timestamp` as Druid's primary timestamp
(`__time`), specify it as the value for `column` in the `timestampSpec`:
+
+```json
+"timestampSpec": {
+ "column": "kinesis.timestamp",
+ "format": "millis"
+}
+```
+
+Finally, add these Kinesis metadata columns to the `dimensionsSpec` or set
your `dimensionsSpec` to auto-detect columns.
Review Comment:
```suggestion
Finally, add these Kinesis metadata columns to the `dimensionsSpec` or set
your `dimensionsSpec` to automatically detect columns.
```
##########
docs/ingestion/data-formats.md:
##########
@@ -731,6 +731,148 @@ This query returns:
|--------------------|-----------|---------------|---------------|
| `development` | `wiki-edit` | `1680795276351` | `wiki-edits` |
+### Kinesis
+
+The `kinesis` input format lets you parse the Kinesis metadata fields in
addition to the Kinesis payload value contents.
+It should only be used when ingesting from Kinesis.
+
+The `kinesis` input format wraps around the payload parsing input format and
augments the data it outputs with the Kinesis event timestamp, and partition
key.
+
+If there are conflicts between column names in the payload and those created
from the metadata, the payload takes precedence.
+This ensures that upgrading a Kinesis ingestion to use the Kinesis input
format (by taking its existing input format and setting it as the
`valueFormat`) can be done without losing any of the payload data.
+
+Configure the Kinesis `inputFormat` as follows:
+
+| Field | Type | Description
| Required | Default |
+|-------|------|---------------------------------------------------------------------------------------------------------------------------------------------------|----------|---------------------|
+| `type` | String | Set value to `kinesis`. | yes ||
+| `valueFormat` | [InputFormat](#input-format) | The [input
format](#input-format) to parse the Kinesis value payload. | yes ||
+| `partitionKeyColumnName` | String | The name of the column for the Kinesis
partition key. This field is useful when ingesting data from multiple
partitions into the same datasource. | no | `kinesis.partitionKey` |
+| `timestampColumnName` | String | The name of the column for the Kinesis
timestamp. | no | `kinesis.timestamp` |
+
+#### Example
+
+Using `{ "type": "json" }` as the input format would only parse the payload
value.
+To parse the Kinesis metadata in addition to the payload, use the `kinesis`
input format.
+
+For example, consider the following structure for a Kinesis message that
represents an edit in a development environment:
+
+- **Kinesis timestamp**: `1680795276351`
+- **Kinesis partition key**: `partition-1`
+- **Kinesis payload value**:
`{"channel":"#sv.wikipedia","timestamp":"2016-06-27T00:00:11.080Z","page":"Salo
Toraut","delta":31,"namespace":"Main"}`
+
+You would configure it as follows:
+
+```json
+{
+ "ioConfig": {
+ "inputFormat": {
+ "type": "kinesis",
+ "valueFormat": {
+ "type": "json"
+ },
+ "timestampColumnName": "kinesis.timestamp",
+ "partitionKeyColumnName": "kinesis.partitionKey"
+ }
+ }
+}
+```
+
+You would parse the example message as follows:
Review Comment:
```suggestion
You would parse the example record as follows:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]