[
https://issues.apache.org/jira/browse/KAFKA-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059891#comment-15059891
]
Ismael Juma commented on KAFKA-2992:
------------------------------------
I had a look at the code in question and I'm surprised by the findings. The
code for `processPartitionData` follows:
{code}
val TopicAndPartition(topic, partitionId) = topicAndPartition
val replica = replicaMgr.getReplica(topic, partitionId).get
val messageSet = partitionData.toByteBufferMessageSet
warnIfMessageOversized(messageSet)
if (fetchOffset != replica.logEndOffset.messageOffset)
throw new RuntimeException("Offset mismatch: fetched offset = %d, log
end offset = %d.".format(fetchOffset, replica.logEndOffset.messageOffset))
trace("Follower %d has replica log end offset %d for partition %s.
Received %d messages and leader hw %d"
.format(replica.brokerId, replica.logEndOffset.messageOffset,
topicAndPartition, messageSet.sizeInBytes, partitionData.highWatermark))
replica.log.get.append(messageSet, assignOffsets = false)
trace("Follower %d has replica log end offset %d after appending %d bytes
of messages for partition %s"
.format(replica.brokerId, replica.logEndOffset.messageOffset,
messageSet.sizeInBytes, topicAndPartition))
val followerHighWatermark =
replica.logEndOffset.messageOffset.min(partitionData.highWatermark)
// for the follower replica, we do not need to keep
// its segment base offset the physical position,
// these values will be computed upon making the leader
replica.highWatermark = new LogOffsetMetadata(followerHighWatermark)
trace("Follower %d set replica high watermark for partition [%s,%d] to %s"
.format(replica.brokerId, topic, partitionId,
followerHighWatermark))
{code}
There are a number of allocations there, so I don't see why the thunk
allocations would be responsible for 98% of the allocations by object count (as
per the original description). If we actually want to solve the issue at hand,
I think more investigation would be needed and we could do more to reduce
allocations. As it is though, it is unclear if there is a real issue or it was
just a profiler artifact (personally, I never trust profiler data in isolation,
it needs to be verified via other means too).
> Trace log statements in the replica fetcher inner loop create large amounts
> of garbage
> --------------------------------------------------------------------------------------
>
> Key: KAFKA-2992
> URL: https://issues.apache.org/jira/browse/KAFKA-2992
> Project: Kafka
> Issue Type: Improvement
> Components: core
> Affects Versions: 0.8.2.1, 0.9.0.0
> Environment: Centos 6, Java 1.8.0_20
> Reporter: Cory Kolbeck
> Priority: Minor
> Labels: garbage, logging, trace
> Fix For: 0.9.1.0
>
>
> We're seeing some GC pause issues in production, and during our investigation
> found that the thunks created during invocation of three trace statements
> guarded in the attached PR were responsible for ~98% of all allocations by
> object count and ~90% by size. While I'm not sure that this was actually the
> cause of our issue, it seems prudent to avoid useless allocations in a tight
> loop.
> I realize that the trace() call does its own guarding internally, however
> it's insufficient to prevent allocation of the thunk. I can work on getting
> profiling results to attach here, but I used YourKit and the license has
> since expired.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)