Padmaprasad Aithal created KAFKA-18830: ------------------------------------------
Summary: Enable parallel processing for transformation chain Key: KAFKA-18830 URL: https://issues.apache.org/jira/browse/KAFKA-18830 Project: Kafka Issue Type: Improvement Components: connect Reporter: Padmaprasad Aithal For Kafka Connect with Debezium plugin, we have below flow Database Logs <- Debezium Plugin <- Kafka Connect -> Transformation Chain (Serial) -> Kafka Topic If transformation chain is expensive operation, overall latency from Database to Topic will be higher. If all transformations in the chain could be run parallell, we can leverage parallel transformation to improve the throughput and reduce overall latency without disturbing the order of events. With parallel transformation, we will be having below flow: {noformat} Database Logs <- Debezium Plugin <- Kafka Connect -> Transformation Chain (Parallel) -> Kafka Topic{noformat} Pseudo Code: {noformat} if (isParallelTransformEnabled) { toSend.parallelStream().forEach(record -> { SourceRecord transformedRecord = transformationChain.apply(record); if (transformedRecord != null) { transformedRecordMap.put(record, transformedRecord); } }); } ... if (isParallelTransformEnabled) { record = transformedRecordMap.get(preTransformRecord); } else { record = transformationChain.apply(preTransformRecord); }{noformat} PS: Since all transformation may not be thread safe and support parallelism, this feature is enabled through the feature flag The same implementation can be found here: https://github.com/confluentinc/kafka/pull/1282 -- This message was sent by Atlassian Jira (v8.20.10#820010)