Padmaprasad Aithal created KAFKA-18830:
------------------------------------------

             Summary: Enable parallel processing for transformation chain
                 Key: KAFKA-18830
                 URL: https://issues.apache.org/jira/browse/KAFKA-18830
             Project: Kafka
          Issue Type: Improvement
          Components: connect
            Reporter: Padmaprasad Aithal


For Kafka Connect with Debezium plugin, we have below flow

Database Logs <- Debezium Plugin <- Kafka Connect -> Transformation Chain 
(Serial) -> Kafka Topic

If transformation chain is expensive operation, overall latency from Database 
to Topic will be higher. If all transformations in the chain could be run 
parallell, we can leverage parallel transformation to improve the throughput 
and reduce overall latency without disturbing the order of events.

With parallel transformation, we will be having below flow:

 
{noformat}
Database Logs <- Debezium Plugin <- Kafka Connect -> Transformation Chain 
(Parallel) -> Kafka Topic{noformat}
 

Pseudo Code:
{noformat}
if (isParallelTransformEnabled) {
  toSend.parallelStream().forEach(record -> {
  SourceRecord transformedRecord = transformationChain.apply(record);
  if (transformedRecord != null) {
     transformedRecordMap.put(record, transformedRecord);
  }
  });
}

...

if (isParallelTransformEnabled) {
    record = transformedRecordMap.get(preTransformRecord);
} else {
    record = transformationChain.apply(preTransformRecord);
}{noformat}
PS:

Since all transformation may not be thread safe and support parallelism, this 
feature is enabled through the feature flag

 

The same implementation can be found here: 
https://github.com/confluentinc/kafka/pull/1282



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to