[ 
https://issues.apache.org/jira/browse/KAFKA-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3429:
---------------------------------
    Description: 
Currently in KTable aggregate operations where a repartition is possibly needed 
since the aggregation key may not be the same as the original primary key, we 
require the users to provide serdes (default to configured ones) for read / 
write to the internally created re-partition topic. However, these are not 
necessary since for all KTable instances either generated from the topics 
directly:

{code}table = builder.table(...){code}

or from aggregation operations:

{code}table = stream.aggregate(...){code}

There are already serde provided for materializing the data, and hence the same 
serde can be re-used when the resulted KTable is involved in future aggregation 
operations. For example:

{code}
table1 = stream.aggregateByKey(serde);
table2 = table1.aggregate(aggregator, selector, originalSerde, aggregateSerde);
{code}

We would not need to require users to specify the "originalSerde" in 
table1.aggregate since it could always reuse the "serde" from 
stream.aggregateByKey, which is used to materialize the table1 object.

In order to get ride of it, implementation-wise we need to carry the serde 
information along with the KTableImpl instance in order to re-use it in a 
future operation that requires repartitioning.

  was:
Currently in KTable stateful operations, we require the users to provide serdes 
(default to configured ones) for repartitioning. However, these are not 
necessary since for all KTable instances either generated from the topics 
directly:

{code}builder.table(...){code}

or from aggregation operations:

{code}stream.aggregate(...){code}

There are already serde provided for materializing the data, and hence the same 
serde can be re-used when the resulted KTable is involved in future stateful 
operations. Implementation-wise we need to carry the serde information along 
with the KTableImpl instance in order to do it.


> Remove Serdes needed for repartitioning in KTable stateful operations
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-3429
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3429
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Guozhang Wang
>              Labels: newbie++
>             Fix For: 0.10.0.1
>
>
> Currently in KTable aggregate operations where a repartition is possibly 
> needed since the aggregation key may not be the same as the original primary 
> key, we require the users to provide serdes (default to configured ones) for 
> read / write to the internally created re-partition topic. However, these are 
> not necessary since for all KTable instances either generated from the topics 
> directly:
> {code}table = builder.table(...){code}
> or from aggregation operations:
> {code}table = stream.aggregate(...){code}
> There are already serde provided for materializing the data, and hence the 
> same serde can be re-used when the resulted KTable is involved in future 
> aggregation operations. For example:
> {code}
> table1 = stream.aggregateByKey(serde);
> table2 = table1.aggregate(aggregator, selector, originalSerde, 
> aggregateSerde);
> {code}
> We would not need to require users to specify the "originalSerde" in 
> table1.aggregate since it could always reuse the "serde" from 
> stream.aggregateByKey, which is used to materialize the table1 object.
> In order to get ride of it, implementation-wise we need to carry the serde 
> information along with the KTableImpl instance in order to re-use it in a 
> future operation that requires repartitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to