Hi Diego,

Thanks for writing this KIP. Are you still planning to work on this? If
not, I'd be happy to try and take this to completion!

Hi Chris,

Thanks for your valuable inputs as always!

> It looks like this KIP does not take the SinkTask::open and
SinkTask::close methods into account

> I think we may want to add this type of support to the KIP so that we can
solve the mutating SMT/asynchronous sink connector problem once and for all

Could you please clarify what you mean here? Do you mean to say that the
partitions passed to the SinkTask::open / SinkTask::close methods should
not be the original topic partitions? If so, how would we be able to change
that while maintaining backward compatibility? Would we want to add new
methods to SinkTask's public API which connectors could choose to implement
instead? Also, currently SinkTask::open is called when partitions are
assigned to the task's consumer in a consumer rebalance; at this point
there is no way to know of any "transformed" topic partitions - that can
only be determined after the consumer polls and records are converted +
transformed. Similarly, SinkTask::close is called when partitions are
revoked from the task's consumer in a consumer rebalance. Are you
suggesting we change when SinkTask::open and SinkTask::close are called
(maybe by book-keeping all currently known post-transformation topic
partitions)?

>  I agree with the rationale for not exposing more of the original
consumer record for the most part, but what about the record's offset?

Should we pollute the public API with a method that in all likelihood will
never be used? Maybe I'm lacking imagination here, but why would an SMT
ever want to modify the record's offset? Is there any such SMT currently?
It seems to me that adding such a method may just be unnecessary complexity
and a potential source of confusion to Connect developers, WDYT?

> do you think it'd make sense to separate out the newly-proposed
SinkTask::originalTopicPartition method into separate
SinkTask::originalTopic and SinkTask::originalKafkaPartition methods, to
stay in line with the convention that's been loosely set by the existing,
separate SinkTask::topic and SinkTask::kafkaPartition methods?

I would tend to agree with this rationale.

Thanks,
Yash

On 2021/11/03 22:17:06 Diego Erdody wrote:
> Hello,
>
> I'd like to propose a small KIP to add a new field to SinkRecord in order
> to add support for topic-mutating SMTs (e.g. RegexRouter) to asynchronous
> Sink Connectors (the ones that override preCommit for internal offset
> tracking, like S3
> <
https://github.com/confluentinc/kafka-connect-storage-cloud/blob/master/kafka-connect-s3/src/main/java/io/confluent/connect/s3/S3SinkTask.java#L274
>
> ).
>
> Links:
>
> - KIP-793: Sink Connectors: Support topic-mutating SMTs for async
> connectors (preCommit users)
> <https://cwiki.apache.org/confluence/x/fpFnCw>
> - PR #11464 <https://github.com/apache/kafka/pull/11464>
>
> Thanks,
>
> Diego
>

Reply via email to