[ 
https://issues.apache.org/jira/browse/KAFKA-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103967#comment-16103967
 ] 

Jiangjie Qin commented on KAFKA-5621:
-------------------------------------

[~apurva] I am trying to understand the following statement
{quote}
On the other hand, for an application, partitions are not really independent 
(and especially so if you use transactions). If one partition is down, it makes 
sense to wait for it to be ready before continuing. So we would want to handle 
as many errors internally as possible. It would mean blocking sends once the 
queue is too large and not expiring batches in the queue. This simplifies the 
application programming model.
{quote}

Is it really different from applications and MM when a partition cannot make 
progress? It seems in both cases the users would want to know that at some 
point and handle it? I think retries are also for this purpose, otherwise we 
may block forever. If I understand right, what this ticket is proposing is just 
to extend the batch expiration time from request.timeout.ms to 
request.timeout.ms * reties. And KIP-91 proposes having an additional explicit 
configuration for that batch expiration time instead of deriving it from 
request timeout. They seem not quite different except that KIP-91 decouples the 
configurations from each other.

KAFKA-5494 is a good improvement. Regarding the error/anomaly handling, If we 
are willing to make public interface changes given the next release would be 
1.0.0, I am thinking of the following configurations:
1. request.timeout.ms - needed for wire timeout
2. expiry.ms - the expiration time for a message, this is an approximate time 
to expire a message if it cannot be sent out for whatever reason after it is 
ready for sending (the batch is ready). In the worst case a message would be 
expired in (expiry.ms + request.timeout.ms) after that message is ready for 
sending (note that user defines when the message is ready for sending by 
specifying linger.ms and batch.size). expiry.ms should be longer than 
request.timeout.ms, e.g. 2x or 3x.

The following configs are optional and will be decided by the producer if not 
specified:
3. min.retries - When this config is specified, the producer will at least 
retry for min.retries times even if that will cause the message stay in the 
producer longer than expiry.ms. This is to avoid the case that the producer 
cannot even retry at least once. When retry, the producer will do exponential 
backoff internally. This could be default to 1.

Hopefully this gives us a cleaner configuration set for the producer.

> The producer should retry expired batches when retries are enabled
> ------------------------------------------------------------------
>
>                 Key: KAFKA-5621
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5621
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Apurva Mehta
>             Fix For: 1.0.0
>
>
> Today, when a batch is expired in the accumulator, a {{TimeoutException}} is 
> raised to the user.
> It might be better the producer to retry the expired batch rather up to the 
> configured number of retries. This is more intuitive from the user's point of 
> view. 
> Further the proposed behavior makes it easier for applications like mirror 
> maker to provide ordering guarantees even when batches expire. Today, they 
> would resend the expired batch and it would get added to the back of the 
> queue, causing the output ordering to be different from the input ordering.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to