[GitHub] [druid] loquisgon commented on a change in pull request #11796: edits to kafka inputFormat

GitBox Wed, 13 Oct 2021 11:35:48 -0700


loquisgon commented on a change in pull request #11796:
URL: https://github.com/apache/druid/pull/11796#discussion_r727524064




##########
File path: docs/ingestion/data-formats.md
##########
@@ -179,47 +192,28 @@ The `inputFormat` to load complete kafka record including 
header, key and value.
 }
 ```
 
-The KAFKA `inputFormat` has the following components:
-
-> Note that KAFKA inputFormat is currently designated as experimental.
-
-| Field | Type | Description | Required |
-|-------|------|-------------|----------|
-| type | String | This should say `kafka`. | yes |
-| headerLabelPrefix | String | A custom label prefix for all the header 
columns. | no (default = "kafka.header.") |
-| timestampColumnName | String | Specifies the name of the column for the 
kafka record's timestamp.| no (default = "kafka.timestamp") |
-| keyColumnName | String | Specifies the name of the column for the kafka 
record's key.| no (default = "kafka.key") |
-| headerFormat | Object | headerFormat specifies how to parse the kafka 
headers. Current supported type is "string". Since header values are bytes, the 
current parser by defaults reads it as UTF-8 encoded strings. There is 
flexibility to change this behavior by implementing your very own parser based 
on the encoding style. The 'encoding' type in KafkaStringHeaderFormat class 
needs to change with the custom implementation. | no |
-| keyFormat | [InputFormat](#input-format) | keyFormat can be any existing 
inputFormat to parse the kafka key. The current behavior is to only process the 
first entry of the input format. See [the below 
section](../development/extensions-core/kafka-ingestion.md#specifying-data-format)
 for details about specifying the input format. | no |
-| valueFormat | [InputFormat](#input-format) | valueFormat can be any existing 
inputFormat to parse the kafka value payload. See [the below 
section](../development/extensions-core/kafka-ingestion.md#specifying-data-format)
 for details about specifying the input format. | yes |
+Not the following behaviors:

Review comment:
       "Note" (?)

##########
File path: docs/ingestion/data-formats.md
##########
@@ -179,47 +192,28 @@ The `inputFormat` to load complete kafka record including 
header, key and value.
 }
 ```
 
-The KAFKA `inputFormat` has the following components:
-
-> Note that KAFKA inputFormat is currently designated as experimental.
-
-| Field | Type | Description | Required |
-|-------|------|-------------|----------|
-| type | String | This should say `kafka`. | yes |
-| headerLabelPrefix | String | A custom label prefix for all the header 
columns. | no (default = "kafka.header.") |
-| timestampColumnName | String | Specifies the name of the column for the 
kafka record's timestamp.| no (default = "kafka.timestamp") |
-| keyColumnName | String | Specifies the name of the column for the kafka 
record's key.| no (default = "kafka.key") |
-| headerFormat | Object | headerFormat specifies how to parse the kafka 
headers. Current supported type is "string". Since header values are bytes, the 
current parser by defaults reads it as UTF-8 encoded strings. There is 
flexibility to change this behavior by implementing your very own parser based 
on the encoding style. The 'encoding' type in KafkaStringHeaderFormat class 
needs to change with the custom implementation. | no |
-| keyFormat | [InputFormat](#input-format) | keyFormat can be any existing 
inputFormat to parse the kafka key. The current behavior is to only process the 
first entry of the input format. See [the below 
section](../development/extensions-core/kafka-ingestion.md#specifying-data-format)
 for details about specifying the input format. | no |
-| valueFormat | [InputFormat](#input-format) | valueFormat can be any existing 
inputFormat to parse the kafka value payload. See [the below 
section](../development/extensions-core/kafka-ingestion.md#specifying-data-format)
 for details about specifying the input format. | yes |
+Not the following behaviors:
+- Druid uses Kafka's column names to resolve conflicts with dimension or 
metric names. This behavior makes the Kafka `inputFormat` compatible with 
existing Kafka input formats but adds columns from kafka header and key.

Review comment:
       I feel that this from the PR's description better explains the conflict 
resolution...your call what to use though: 
   
   "During conflicts in dimension/metrics names, the code will prefer dimension 
names from payload and ignore the dimension either from headers/key. This is 
done so that existing input formats can be easily migrated to this new format 
without worrying about losing information."




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] loquisgon commented on a change in pull request #11796: edits to kafka inputFormat

Reply via email to