loquisgon commented on a change in pull request #11796:
URL: https://github.com/apache/druid/pull/11796#discussion_r727524064
##########
File path: docs/ingestion/data-formats.md
##########
@@ -179,47 +192,28 @@ The `inputFormat` to load complete kafka record including
header, key and value.
}
```
-The KAFKA `inputFormat` has the following components:
-
-> Note that KAFKA inputFormat is currently designated as experimental.
-
-| Field | Type | Description | Required |
-|-------|------|-------------|----------|
-| type | String | This should say `kafka`. | yes |
-| headerLabelPrefix | String | A custom label prefix for all the header
columns. | no (default = "kafka.header.") |
-| timestampColumnName | String | Specifies the name of the column for the
kafka record's timestamp.| no (default = "kafka.timestamp") |
-| keyColumnName | String | Specifies the name of the column for the kafka
record's key.| no (default = "kafka.key") |
-| headerFormat | Object | headerFormat specifies how to parse the kafka
headers. Current supported type is "string". Since header values are bytes, the
current parser by defaults reads it as UTF-8 encoded strings. There is
flexibility to change this behavior by implementing your very own parser based
on the encoding style. The 'encoding' type in KafkaStringHeaderFormat class
needs to change with the custom implementation. | no |
-| keyFormat | [InputFormat](#input-format) | keyFormat can be any existing
inputFormat to parse the kafka key. The current behavior is to only process the
first entry of the input format. See [the below
section](../development/extensions-core/kafka-ingestion.md#specifying-data-format)
for details about specifying the input format. | no |
-| valueFormat | [InputFormat](#input-format) | valueFormat can be any existing
inputFormat to parse the kafka value payload. See [the below
section](../development/extensions-core/kafka-ingestion.md#specifying-data-format)
for details about specifying the input format. | yes |
+Not the following behaviors:
Review comment:
"Note" (?)
##########
File path: docs/ingestion/data-formats.md
##########
@@ -179,47 +192,28 @@ The `inputFormat` to load complete kafka record including
header, key and value.
}
```
-The KAFKA `inputFormat` has the following components:
-
-> Note that KAFKA inputFormat is currently designated as experimental.
-
-| Field | Type | Description | Required |
-|-------|------|-------------|----------|
-| type | String | This should say `kafka`. | yes |
-| headerLabelPrefix | String | A custom label prefix for all the header
columns. | no (default = "kafka.header.") |
-| timestampColumnName | String | Specifies the name of the column for the
kafka record's timestamp.| no (default = "kafka.timestamp") |
-| keyColumnName | String | Specifies the name of the column for the kafka
record's key.| no (default = "kafka.key") |
-| headerFormat | Object | headerFormat specifies how to parse the kafka
headers. Current supported type is "string". Since header values are bytes, the
current parser by defaults reads it as UTF-8 encoded strings. There is
flexibility to change this behavior by implementing your very own parser based
on the encoding style. The 'encoding' type in KafkaStringHeaderFormat class
needs to change with the custom implementation. | no |
-| keyFormat | [InputFormat](#input-format) | keyFormat can be any existing
inputFormat to parse the kafka key. The current behavior is to only process the
first entry of the input format. See [the below
section](../development/extensions-core/kafka-ingestion.md#specifying-data-format)
for details about specifying the input format. | no |
-| valueFormat | [InputFormat](#input-format) | valueFormat can be any existing
inputFormat to parse the kafka value payload. See [the below
section](../development/extensions-core/kafka-ingestion.md#specifying-data-format)
for details about specifying the input format. | yes |
+Not the following behaviors:
+- Druid uses Kafka's column names to resolve conflicts with dimension or
metric names. This behavior makes the Kafka `inputFormat` compatible with
existing Kafka input formats but adds columns from kafka header and key.
Review comment:
I feel that this from the PR's description better explains the conflict
resolution...your call what to use though:
"During conflicts in dimension/metrics names, the code will prefer dimension
names from payload and ignore the dimension either from headers/key. This is
done so that existing input formats can be easily migrated to this new format
without worrying about losing information."
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]