Alexey Raga created KAFKA-3656:
----------------------------------

             Summary: Avoid stressing system more when already under stress
                 Key: KAFKA-3656
                 URL: https://issues.apache.org/jira/browse/KAFKA-3656
             Project: Kafka
          Issue Type: Bug
            Reporter: Alexey Raga


I am working with Kafka Connect now and I am having error messages like that:

{code}
[2016-05-04 03:11:28,226] ERROR Failed to flush 
WorkerSourceTask{id=geo-connector-0}, timed out while waiting for producer to 
flush outstanding messages, 151860 left ([FAILED toString()]) 
(org.apache.kafka.connect.runtime.WorkerSourceTask:237)
[2016-05-04 03:11:28,227] ERROR Failed to commit offsets for 
WorkerSourceTask{id=geo-connector-0} 
(org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:112)
{code}

I didn't figure out the reason why Connect would pull so many records into 
memory when it clearly can't produce that fast and I don't yet know why 
producing messages is slow.

But the part of {{151860 left ([FAILED toString()]) }} is interesting and I 
looked at the code and found this:

{code}
if (timeoutMs <= 0) {
                        log.error(
                                "Failed to flush {}, timed out while waiting 
for producer to flush outstanding "
                                        + "messages, {} left ({})", this, 
outstandingMessages.size(), outstandingMessages);
                        finishFailedFlush();
                        return false;
                    }
{code}

So when the connector is under stress and, assuming {{151860}} messages, under 
a heavy memory pressure the code choses to take pretty much {{4 * 151860}} byte 
arrays and to convert it to a java string.
This not only eats more memory and adds to GC, but is also useless for logging 
because the actual string, if it wouldn't fail, would look like:

{code}
(topic=lamington--geo-connector, partition=null, key=null, 
value=[B@62c66f62=ProducerRecord(topic=lamington--geo-connector, 
partition=null, key=null, value=[B@62c66f62, 
ProducerRecord(topic=lamington--geo-connector, partition=null, key=null, .....
{code}

I think it is a bug and a string representation of the outstanding messages 
should be removed from the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to