[ 
https://issues.apache.org/jira/browse/KAFKA-3543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236355#comment-15236355
 ] 

Guozhang Wang commented on KAFKA-3543:
--------------------------------------

In Kafka Streams we do not materialize in each operator, but only necessary in 
those stateful operators (like joins and aggregates); for stateless operators 
like map / filter / flatMap you do not need to provide the serdes. More 
concretely, let's say you have the following topology:

{code}
stream1.map(...) // no serde needed
             .map(...) // no serde needed
             .join(stream2)  // now you need to provide the serde for data 
types returned from the second map
{code}

You only need to provide the serde in the join operator, and the returned value 
from the first map will be passed directly to the second map, and hence not 
need to materialized anywhere.

> Allow a variant of transform() which can emit multiple values
> -------------------------------------------------------------
>
>                 Key: KAFKA-3543
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3543
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 0.10.0.0
>            Reporter: Greg Fodor
>            Assignee: Guozhang Wang
>
> Right now it seems that if you want to apply an arbitrary stateful 
> transformation to a stream, you either have to use a TransformerSupplier or 
> ProcessorSupplier sent to transform() or process(). The custom processor will 
> allow you to emit multiple new values, but the process() method currently 
> terminates that branch of the topology so you can't apply additional data 
> flow. transform() lets you continue the data flow, but forces you to emit a 
> single value for every input value.
> (It actually doesn't quite force you to do this, since you can hold onto the 
> ProcessorContext and emit multiple, but that's probably not the ideal way to 
> do it :))
> It seems desirable to somehow allow a transformation that emits multiple 
> values per input value. I'm not sure of the best way to factor this inside of 
> the current TransformerSupplier/Transformer architecture in a way that is 
> clean and efficient -- currently I'm doing the workaround above of just 
> calling forward() myself on the context and actually emitting dummy values 
> which are filtered out downstream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to