gianm opened a new pull request, #15692: URL: https://github.com/apache/druid/pull/15692
Fixes a bug where the KafkaInputFormat would parse incoming JSON newline-delimited (as if it were a batch ingest) rather than as a whole entity (as is typical for streaming ingest). Background: JsonInputFormat has a `withLineSplittable` method that can be used to control whether JSON is read line-by-line, or as a whole. The intent is that in streaming ingestion, `lineSplittable` is false (although it can be overridden by `assumeNewlineDelimited`), and in batch ingestion, `lineSplittable` is true. When a `json` format is wrapped by a `kafka` format, this isn't set properly. This patch updates KafkaInputFormat to set this on an underlying `json` format. The tests for KafkaInputFormat were overriding the `lineSplittable` parameter explicitly, which wasn't really fair, because that made them unrealistic to what happens in production. Now they omit the parameter and get the production behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
