Hi all, I took the pause with this KIP while Kafka 4.0 was in making to not distract the folks. Now let's continue the discussion!
Thank you for the comments, Luke! I've applied your suggestions. Best, Ivan On Mon, Dec 23, 2024, at 03:23, Luke Chen wrote: > Hi Ivan, > > Thanks for the KIP! > This is a great improvement from the cost and latency perspective! > > Some comments: > 1. In the description of `partitioner.rack.aware` config, it'd be better to > make it clear that this setting has no effect if a custom partitioner is > used. > > 2. "Select the next partition from all partitions following the current > algorithm in the following cases:" > I think there should be one more case that "If the "partitioner.rack.aware" > is false; > > 3. "If the automatic partitioning is needed (i.e. no record partition or > key is specified):" > I think we should also add the case: "key is provided but > `partitioner.ignore.keys` > is enabled" > > Thank you. > Luke > > > On Sat, Dec 21, 2024 at 2:32 AM Stanislav Kozlovski < > stanislavkozlov...@apache.org> wrote: > > > Wow, I am super happy to see this KIP! Thanks for publishing it! > > > > I threw the idea out there last week in an article of mine about > > calculating Kafka costs[1] > > > > > [FUTURE KIP] - a Produce to Local Leader KIP, similar to KIP-392, can be > > introduced to eliminate producer inter-AZ network costs for topics that do > > not have keys. > > > there is no fundamental reason that a topic without ordering guarantees > > needs to produce to a specific partition - why not just choose the broker > > in the closest zone? > > > if all of your traffic is unkeyed, then this can further reduce Kafka’s > > network cost by 25%. > > > it sounds like a change that wouldn’t be too complicated, maybe even > > achievable today through the Producer’s partitioner. > > > > I don't know if you saw it from there, but I'm super happy to see it come > > to fruition! It's even easier than I thought - I didn't realize we had the > > node/rack information in the partitioner already. > > > > I think it will be very impactful. > > We've seen the strong trend in the industry of trading off latency for > > cost reduction. Namely - almost every vendor has introduced some sort of > > leaderless Kafka API model that outsources replication to a remote store > > cost[2][3][4][5]. This in turn allows them to reduce cross-zone networking > > costs to literally zero. In certain optimized deployments the networking > > cost can be up to 80-90% of the total cost![6] KIP-392 allows us to > > eliminate the consumer-side traffic cost, but there is great motivation to > > enable users to do the same for producers that don't depend on ordering. > > > > I am +1 the KIP as is. > > > > One may make an argument to have a way to enable it server-side via the > > broker, but I'd like to hear a good reason for that. I believe the > > simplicity in the current state is preferred, since clients already have > > freedom to produce to any partition they explicitly choose. > > > > Best, > > Stan > > > > [1] > > https://bigdata.2minutestreaming.com/p/the-brutal-truth-about-apache-kafka-cost-calculators > > [2] WarpStream and its $220m acquisition > > https://www.linkedin.com/pulse/how-confluent-acquired-warpstream-220m-after-just-13-months-hxgyf/ > > [3] Confluent Freight > > https://www.confluent.io/blog/introducing-confluent-cloud-freight-clusters/ > > [4] RedPanda Cloud Topics > > https://www.redpanda.com/blog/cloud-topics-streaming-data-object-storage > > [5] BufStream https://buf.build/product/bufstream > > [6] calculator https://akalculator.com/ > > > > On 2024/12/20 11:35:28 Ivan Yurchenko wrote: > > > Hello all, > > > > > > I'd like to propose a new KIP to discuss: KIP-1123: Rack-aware > > partitioning for Kafka Producer [1]. > > > > > > Best, > > > Ivan Yurchenko > > > > > > [1] > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1123%3A+Rack-aware+partitioning+for+Kafka+Producer > > > > > >