[
https://issues.apache.org/jira/browse/KAFKA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750375#comment-13750375
]
Jay Kreps commented on KAFKA-955:
---------------------------------
Great fix. A few minor comments, mostly stylistic.
RequestChannel.scala:
1. This usage exposes a bit much:
requestChannel.sendResponse(new RequestChannel.Response(request.processor,
request, null, RequestChannel.CloseSocket))
I think it might be nicer to have this instead:
requestChannel.close(request.processor, request)
and
requestChannel.noResponse(req.processor, request)
Implementation would be the same, it just would just be a little more clear for
the user and the response codes can be private.
Likewise in the response object I should be able to
2. These are a little confusing:
val SendResponse: Short = 0
val NoResponse: Short = 1
val CloseSocket: Short = 9
Why is it 0, 1, and 9?
What is the relationship between these and ErrorMapping? It should be clear
from reading.
Is there a reason we can't use a case class
case class ResponseAction
case object SendAction extends ResponseAction
case object NoOpAction extends ResponseAction
case object CloseConnectionAction extends ResponseAction
Then to use it
response.action match {
case SendAction => do send
case NoOpAction => read more
case CloseConnectionAction => something
}
This seems clearer to me and I don't think it is significantly more expensive.
Can we also standardize the usage so that we no longer have the user EITHER
give null or NoResponse? It should be one or the other.
3. This logging "Cancelling the request key to notify socket server close the
connection due to error handling produce request " is not informative to the
user. What does it mean to cancel a key? What broke? What should they do? I
also think this should be info unless we want the server admin to take some
action (I don't think so, right? This is a normal occurance).
SocketServer.scala
4. The comment "a null response send object" is retained but we are no longer
using null to indicate this we are using RequestChannel.NoResponse. I think
this comment is actually a little verbose given that we now have a nicely named
response action.
ProducerTest.scala:
5. org.scalatest.TestFailedException: Is there a reason you are giving the full
path here instead of importing it
Question on testing, what is the message loss rate with acks=0 under moderate
load if we do something like a controlled shutdown with other replicas
available?
> After a leader change, messages sent with ack=0 are lost
> --------------------------------------------------------
>
> Key: KAFKA-955
> URL: https://issues.apache.org/jira/browse/KAFKA-955
> Project: Kafka
> Issue Type: Bug
> Reporter: Jason Rosenberg
> Assignee: Guozhang Wang
> Attachments: KAFKA-955.v1.patch, KAFKA-955.v1.patch,
> KAFKA-955.v2.patch, KAFKA-955.v3.patch, KAFKA-955.v4.patch,
> KAFKA-955.v5.patch, KAFKA-955.v6.patch
>
>
> If the leader changes for a partition, and a producer is sending messages
> with ack=0, then messages will be lost, since the producer has no active way
> of knowing that the leader has changed, until it's next metadata refresh
> update.
> The broker receiving the message, which is no longer the leader, logs a
> message like this:
> Produce request with correlation id 7136261 from client on partition
> [mytopic,0] failed due to Leader not local for partition [mytopic,0] on
> broker 508818741
> This is exacerbated by the controlled shutdown mechanism, which forces an
> immediate leader change.
> A possible solution to this would be for a broker which receives a message,
> for a topic that it is no longer the leader for (and if the ack level is 0),
> then the broker could just silently forward the message over to the current
> leader.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira