[ https://issues.apache.org/jira/browse/KAFKA-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059891#comment-15059891 ]
Ismael Juma commented on KAFKA-2992: ------------------------------------ I had a look at the code in question and I'm surprised by the findings. The code for `processPartitionData` follows: {code} val TopicAndPartition(topic, partitionId) = topicAndPartition val replica = replicaMgr.getReplica(topic, partitionId).get val messageSet = partitionData.toByteBufferMessageSet warnIfMessageOversized(messageSet) if (fetchOffset != replica.logEndOffset.messageOffset) throw new RuntimeException("Offset mismatch: fetched offset = %d, log end offset = %d.".format(fetchOffset, replica.logEndOffset.messageOffset)) trace("Follower %d has replica log end offset %d for partition %s. Received %d messages and leader hw %d" .format(replica.brokerId, replica.logEndOffset.messageOffset, topicAndPartition, messageSet.sizeInBytes, partitionData.highWatermark)) replica.log.get.append(messageSet, assignOffsets = false) trace("Follower %d has replica log end offset %d after appending %d bytes of messages for partition %s" .format(replica.brokerId, replica.logEndOffset.messageOffset, messageSet.sizeInBytes, topicAndPartition)) val followerHighWatermark = replica.logEndOffset.messageOffset.min(partitionData.highWatermark) // for the follower replica, we do not need to keep // its segment base offset the physical position, // these values will be computed upon making the leader replica.highWatermark = new LogOffsetMetadata(followerHighWatermark) trace("Follower %d set replica high watermark for partition [%s,%d] to %s" .format(replica.brokerId, topic, partitionId, followerHighWatermark)) {code} There are a number of allocations there, so I don't see why the thunk allocations would be responsible for 98% of the allocations by object count (as per the original description). If we actually want to solve the issue at hand, I think more investigation would be needed and we could do more to reduce allocations. As it is though, it is unclear if there is a real issue or it was just a profiler artifact (personally, I never trust profiler data in isolation, it needs to be verified via other means too). > Trace log statements in the replica fetcher inner loop create large amounts > of garbage > -------------------------------------------------------------------------------------- > > Key: KAFKA-2992 > URL: https://issues.apache.org/jira/browse/KAFKA-2992 > Project: Kafka > Issue Type: Improvement > Components: core > Affects Versions: 0.8.2.1, 0.9.0.0 > Environment: Centos 6, Java 1.8.0_20 > Reporter: Cory Kolbeck > Priority: Minor > Labels: garbage, logging, trace > Fix For: 0.9.1.0 > > > We're seeing some GC pause issues in production, and during our investigation > found that the thunks created during invocation of three trace statements > guarded in the attached PR were responsible for ~98% of all allocations by > object count and ~90% by size. While I'm not sure that this was actually the > cause of our issue, it seems prudent to avoid useless allocations in a tight > loop. > I realize that the trace() call does its own guarding internally, however > it's insufficient to prevent allocation of the thunk. I can work on getting > profiling results to attach here, but I used YourKit and the license has > since expired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)