[
https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18036673#comment-18036673
]
fujian commented on KAFKA-14768:
--------------------------------
at last. I think we can just improve the document at
[https://github.com/apache/kafka/pull/20842] to balance the reword and effort.
thanks
> proposal to reduce the first message's send time cost and max block time for
> safety
> ------------------------------------------------------------------------------------
>
> Key: KAFKA-14768
> URL: https://issues.apache.org/jira/browse/KAFKA-14768
> Project: Kafka
> Issue Type: Improvement
> Components: clients
> Affects Versions: 3.3.1, 3.3.2
> Reporter: fujian
> Assignee: hzh0425
> Priority: Major
> Labels: needs-kip, performance
>
> Hi, Team:
>
> Nice to meet you!
>
> In our business, we found two types of issue which need to improve:
>
> *(1) Take much time to send the first message*
> Sometimes, we found the users' functional interaction take a lot of time. At
> last, we figure out the root cause is that after we complete deploy or
> restart the servers. The first message's delivery on each application server
> by kafka client will take much time.
> So, we try to find one solution to improve it.
>
> After analyzing the source code about the first time's sending logic. The
> time cost is caused by the getting metadata before the sending. The latter's
> sending won't take the much time due to the cached metadata. The logic is
> right and necessary. Thus, we still want to improve the experience for the
> first message's send/user first interaction.
>
> *(2) can't reduce the send message's block time to wanted value.*
> Sometimes our application's thread will block for max.block.ms to send
> message. When we try to reduce the max.block.ms to reduce the blocking time.
> It can't meet the getting metadata's time requirement sometimes. The root
> cause is the configured max.block.ms is shared with "get metadata" operation
> and "send message" operation. We can refer to follow tables:
> |*where to block*
> |*when it is blocked*
> |*how long it will be blocked?*
> |
> |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first
> request which need to load the metadata from kafka|<max.block.ms|
> |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak
> time for business, if the network can’t send message in short
> time.|<max.block.ms|
>
> What's the solution for the above two issues:
> I think about current logic and figure out followed possible solution:
> (1) send one "warmup" message, thus we can't send any fake message.
> (2) provide one extra configure time configure which dedicated for getting
> metadata. thus it may break the define for the max.block.ms a little. what's
> more, it only solves issue 2 instead of issue1.
> (3) add one method to call waitOnMetadata with one timeout setting without
> using the max.block.ms (PR: [KAFKA-14768: provide new method to warmup first
> record's sending and reduce the max.block.ms safely by jiafu1115 · Pull
> Request #13320 · apache/kafka
> (github.com)|https://github.com/apache/kafka/pull/13320])
>
> _note: org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata_
> ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long
> nowMs, long maxWaitMs)
>
> __
> after the change, we can call it before the service is marked as ready. After
> the ready. it won't block to get metadata due to cache. And then we can be
> safe to reduce the max.block.ms to a lower value to reduce thread's blocking
> time.
>
> After adopting the solution 3. we solve the above issues. For example, we
> reduce the first message's send about 4s seconds. The log can refer to
> followed:
> _warmup test_topic at phase phase 2: get metadata from mq start_
> _warmup test_topic at phase phase 2: get metadata from mq end consume
> *4669ms*_
> And after the change, we reduce the max.block.ms from 10s to 2s without worry
> can't get metadata.
>
> {*}So what's your thought for these two issues and the solution I
> proposed{*}. I hope to get your feedback and thought for the issues.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)