Re: [DISCUSS] KIP-253: Support in-order message delivery with partition expansion

Jan Filipiak Wed, 28 Feb 2018 23:21:19 -0800

Hi Dong,

I tried to focus on what the steps are one can currently perform toexpand or shrink a keyed topic while maintaining a top notch semantics.I can understand that there might be confusion about "stopping theconsumer". It is exactly the same as proposed in the KIP. there needs to bea time the producers agree on the new partitioning. The extra semanticsI want to put in there is that we have a possibility to wait until allthe existing datais copied over into the new partitioning scheme. When I say stopping Ithink more of having a memory barrier that ensures the ordering. I amstill aming for latencies on the scale of leader failovers.

Consumers have to explicitly adapt the new partitioning scheme in theabove scenario. The reason is that in these cases where you aredependent on a particular partitioning scheme, you also have othertopics that have co-partition enforcements or the kind -frequently.Therefore all your other input topics might need to grow accordingly.

What I was suggesting was to streamline all these operations as best aspossible to have "real" partition grow and shrinkage going on. Migratingthe producers to a new partitioning scheme can be much more streamlinedwith proper broker support for this. Migrating consumer is a step thatmight be made completly unnecessary if - for example streams - takes thegcd as partitioning scheme instead of enforcing 1 to 1. Connectconsumers and other consumers should be fine anyways.

I hope this makes more clear where I was aiming at. The rest needs to befigured out. The only danger i see is that when we are introducing thisfeature as supposed in the KIP, it wont help any people depending onlog compaction.

The other thing I wanted to mention is that I believe the currentsuggestion (without copying data over) can be implemented in pureuserland with a custom partitioner and a small feedbackloop fromProduceResponse => Partitionier in coorporation with a change managementsystem.


Best Jan







On 28.02.2018 07:13, Dong Lin wrote:

Hey Jan,

I am not sure if it is acceptable for producer to be stopped for a while,
particularly for online application which requires low latency. I am also
not sure how consumers can switch to a new topic. Does user application
needs to explicitly specify a different topic for producer/consumer to
subscribe to? It will be helpful for discussion if you can provide more
detail on the interface change for this solution.

Thanks,
Dong

On Mon, Feb 26, 2018 at 12:48 AM, Jan Filipiak <jan.filip...@trivago.com>
wrote:

Hi,

just want to throw my though in. In general the functionality is very
usefull, we should though not try to find the architecture to hard while
implementing.

The manual steps would be to

create a new topic
the mirrormake from the new old topic to the new topic
wait for mirror making to catch up.
then put the consumers onto the new topic
     (having mirrormaker spit out a mapping from old offsets to new offsets:
         if topic is increased by factor X there is gonna be a clean
mapping from 1 offset in the old topic to X offsets in the new topic,
         if there is no factor then there is no chance to generate a
mapping that can be reasonable used for continuing)
     make consumers stop at appropriate points and continue consumption
with offsets from the mapping.
have the producers stop for a minimal time.
wait for mirrormaker to finish
let producer produce with the new metadata.


Instead of implementing the approach suggest in the KIP which will leave
log compacted topic completely crumbled and unusable.
I would much rather try to build infrastructure to support the mentioned
above operations more smoothly.
Especially having producers stop and use another topic is difficult and
it would be nice if one can trigger "invalid metadata" exceptions for them
and
if one could give topics aliases so that their produces with the old topic
will arrive in the new topic.

The downsides are obvious I guess ( having the same data twice for the
transition period, but kafka tends to scale well with datasize). So its a
nicer fit into the architecture.

I further want to argument that the functionality by the KIP can
completely be implementing in "userland" with a custom partitioner that
handles the transition as needed. I would appreciate if someone could point
out what a custom partitioner couldn't handle in this case?

With the above approach, shrinking a topic becomes the same steps. Without
loosing keys in the discontinued partitions.

Would love to hear what everyone thinks.

Best Jan


















On 11.02.2018 00:35, Dong Lin wrote:

Hi all,

I have created KIP-253: Support in-order message delivery with partition
expansion. See
https://cwiki.apache.org/confluence/display/KAFKA/KIP-253%
3A+Support+in-order+message+delivery+with+partition+expansion
.

This KIP provides a way to allow messages of the same key from the same
producer to be consumed in the same order they are produced even if we
expand partition of the topic.

Thanks,
Dong

Re: [DISCUSS] KIP-253: Support in-order message delivery with partition expansion

Reply via email to