[ 
https://issues.apache.org/jira/browse/KAFKA-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17696089#comment-17696089
 ] 

fujian commented on KAFKA-14768:
--------------------------------

Hi  [~showuon]:

I create another PR : [KAFKA-14768: add new configure to reduce the 
max.block.ms safely by jiafu1115 · Pull Request #13335 · apache/kafka 
(github.com)|https://github.com/apache/kafka/pull/13335/files] for your 
reference.

The PR is the solution 2 I mentioned. Thought it can't solve the issue 1, but 
it can solve issue 2. 

WDTY? which one is better? I think combining two of them are better one which 
can solve all of the issues. but I think it is also ok for one by one.

Thanks

 

 

> proposal to reduce the first message's send time cost and max block time for 
> safety 
> ------------------------------------------------------------------------------------
>
>                 Key: KAFKA-14768
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14768
>             Project: Kafka
>          Issue Type: Improvement
>          Components: clients
>    Affects Versions: 3.3.1, 3.3.2
>            Reporter: fujian
>            Assignee: hzh0425
>            Priority: Major
>              Labels: needs-kip, performance
>
> Hi, Team:
>  
> Nice to meet you!
>  
> In our business, we found two types of issue which need to improve:
>  
> *(1) Take much time to send the first message*
> Sometimes, we found the users' functional interaction take a lot of time. At 
> last, we figure out the root cause is that after we complete deploy or 
> restart the servers. The first message's delivery on each application server 
> by kafka client will take much time.
> So, we try to find one solution to improve it.
>  
> After analyzing the source code about the first time's sending logic. The 
> time cost is caused by the getting metadata before the sending. The latter's 
> sending won't take the much time due to the cached metadata. The logic is 
> right and necessary. Thus, we still want to improve the experience for the 
> first message's send/user first interaction.
>  
> *(2) can't reduce the send message's block time to wanted value.*
> Sometimes our application's thread will block for max.block.ms to send 
> message. When we try to reduce the max.block.ms to reduce the blocking time. 
> It can't meet the getting metadata's time requirement sometimes. The root 
> cause is the configured max.block.ms is shared with "get metadata" operation 
> and "send message" operation. We can refer to follow tables:
> |*where to block*
>  |*when it is blocked*
>  |*how long it will be blocked?*
>  |
> |org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata|the first 
> request which need to load the metadata from kafka|<max.block.ms|
> |org.apache.kafka.clients.producer.internals.RecordAccumulator#append|at peak 
> time for business, if the network can’t send message in short 
> time.|<max.block.ms|
>  
> What's the solution for the above two issues:
> I think about current logic and figure out followed possible solution:
> (1) send one "warmup" message, thus we can't send any fake message.
> (2) provide one extra configure time configure which dedicated for getting 
> metadata. thus it will break the define for the max.block.ms
> (3) add one method to call waitOnMetadata with one timeout setting without 
> using the max.block.ms  (PR: [KAFKA-14768: provide new method to warmup first 
> record's sending and reduce the max.block.ms safely by jiafu1115 · Pull 
> Request #13320 · apache/kafka 
> (github.com)|https://github.com/apache/kafka/pull/13320])
>  
> _note:  org.apache.kafka.clients.producer.KafkaProducer#waitOnMetadata_
> ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long 
> nowMs, long maxWaitMs)
>  
>  __ 
> after the change, we can call it before the service is marked as ready. After 
> the ready. it won't block to get metadata due to cache. And then we can be 
> safe to reduce the max.block.ms to a lower value to reduce thread's blocking 
> time.
>  
> After adopting the solution 3. we solve the above issues. For example, we 
> reduce the first message's send about 4s seconds. The log can refer to 
> followed:
> _warmup test_topic at phase phase 2: get metadata from mq start_
> _warmup test_topic at phase phase 2: get metadata from mq end consume 
> *4669ms*_
> And after the change, we reduce the max.block.ms from 10s to 2s without worry 
> can't get metadata.
>  
> {*}So what's your thought for these two issues and the solution I 
> proposed{*}. I hope to get your feedback and thought for the issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to