Jason Gustafson created KAFKA-5957:
--------------------------------------

             Summary: Producer IllegalStateException due to second deallocate 
after aborting a batch
                 Key: KAFKA-5957
                 URL: https://issues.apache.org/jira/browse/KAFKA-5957
             Project: Kafka
          Issue Type: Bug
            Reporter: Jason Gustafson
            Assignee: Jason Gustafson
             Fix For: 1.0.0


Saw this recently in a system test failure:

{code}
[2017-09-21 05:04:52,033] ERROR [Producer clientId=producer-1, 
transactionalId=my-second-transactional-id] Aborting producer batches due to 
fatal error (org.apache.kafka.clients.producer.internals.Sender)
org.apache.kafka.common.KafkaException: The client hasn't received 
acknowledgment for some previously sent messages and can no longer retry them. 
It isn't safe to continue.
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211)
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164)
        at java.lang.Thread.run(Thread.java:745)
[2017-09-21 05:04:52,033] TRACE Aborting batch for partition output-topic-2 
(org.apache.kafka.clients.producer.internals.ProducerBatch)
org.apache.kafka.common.KafkaException: The client hasn't received 
acknowledgment for some previously sent messages and can no longer retry them. 
It isn't safe to continue.
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:211)
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:164)
        at java.lang.Thread.run(Thread.java:745)
[2017-09-21 05:04:52,134] TRACE [Producer clientId=producer-1, 
transactionalId=my-second-transactional-id] Not sending transactional request 
(type=EndTxnRequest, transactionalId=my-second-transactional-id, 
producerId=1000, producerEpoch=0, result=COMMIT) because we are in an error 
state (org.apache.kafka.clients.producer.internals.TransactionManager)
[2017-09-21 05:04:52,134] INFO [Producer clientId=producer-1, 
transactionalId=my-second-transactional-id] Closing the Kafka producer with 
timeoutMillis = 9223372036854775807 ms. 
(org.apache.kafka.clients.producer.KafkaProducer)
[2017-09-21 05:04:52,134] DEBUG [Producer clientId=producer-1, 
transactionalId=my-second-transactional-id] Beginning shutdown of Kafka 
producer I/O thread, sending remaining records. 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-09-21 05:04:52,360] TRACE [Producer clientId=producer-1, 
transactionalId=my-second-transactional-id] Received produce response from node 
1 with correlation id 245 (org.apache.kafka.clients.producer.internals.Sender)
[2017-09-21 05:04:52,360] DEBUG [Producer clientId=producer-1, 
transactionalId=my-second-transactional-id] ProducerId: 1000; Set last ack'd 
sequence number for topic-partition output-topic-2 to 136 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-09-21 05:04:52,360] TRACE Successfully produced messages to 
output-topic-2 with base offset 387. 
(org.apache.kafka.clients.producer.internals.ProducerBatch)
[2017-09-21 05:04:52,360] DEBUG ProduceResponse returned for output-topic-2 
after batch had already been aborted. 
(org.apache.kafka.clients.producer.internals.ProducerBatch)
[2017-09-21 05:04:52,360] ERROR [Producer clientId=producer-1, 
transactionalId=my-second-transactional-id] Uncaught error in request 
completion: (org.apache.kafka.clients.NetworkClient)
java.lang.IllegalStateException: Remove from the incomplete set failed. This 
should be impossible.
        at 
org.apache.kafka.clients.producer.internals.IncompleteBatches.remove(IncompleteBatches.java:44)
        at 
org.apache.kafka.clients.producer.internals.RecordAccumulator.deallocate(RecordAccumulator.java:612)
        at 
org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:585)
        at 
org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:561)
        at 
org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:475)
        at 
org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74)
        at 
org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:685)
        at 
org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101)
        at 
org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:481)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:473)
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:225)
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:177)
        at java.lang.Thread.run(Thread.java:745)
{code}
Although we allow a batch to be aborted before it returns, we are not careful 
about preventing a second call to {{deallocate()}} which causes this error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to