[
https://issues.apache.org/jira/browse/KAFKA-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497392#comment-14497392
]
Tim Brooks commented on KAFKA-2102:
-----------------------------------
I added an updated patch. This patch includes a few things:
1. I moved to using a finer locking strategy opposed to attempting to use all
atomic instructions. None of the methods are synchronized.
2. I delegated the synchronization code and data about when the last update
was, etc to a new MetadataBookkeeper. When I was first reading the old code I
had some issues parsing the mixture of cluster state, topic state, state about
when to do the next update, and state about when the last update had been
completed. Maybe my changes make this easier to parse. Maybe they don't.
3. I moved lastNoNodeAvailableMs in the NetworkClient state into the
MetadataBookkeeper. Since this variable was essentially a failed attempt to
update metadata, and it was not accessed in any different way for distinct
metrics, it seemed like it would be nicer to keep state about when the next
metadata update should happen together.
4. No one has responded to KAFKA-2101. But it was highly relevant to what I was
working on, so it is affected by this patch. I created a distinction between
successful metadata update and a metadata update attempt. The metadata-age
metric only uses the last successful update in its reports. This seemed like
the correct approach based on the name of that metric. Since a failed update
does not make the metadata any younger.
The performance improvements are primarily in the 90+ percentile. I ran a
producer test with both five and eight threads pushing 10,000 messages to
kafka. And I repeated it ten times. I recorded the time with HDRHistogram.
The improvements were somewhere between 4-30% reduced latency in the 90+%. For
example at the 0.990625000000 percentile on the five thread test the latency
was reduced from 14.223 microseconds to 9.775 (31%). At the 0.900000000000
percentile the latency was reduced from 2.947 to 2.837 (3.9%) So certainly not
a lot. But pretty consistently across the higher percentiles, the latency is
improved.
In the five thread test the mean decreased 4.8%. In the eight thread test the
mean decreased 7.8%.
The code for the latency test can be found here:
https://github.com/tbrooks8/kafka-latency-test
> Remove unnecessary synchronization when managing metadata
> ---------------------------------------------------------
>
> Key: KAFKA-2102
> URL: https://issues.apache.org/jira/browse/KAFKA-2102
> Project: Kafka
> Issue Type: Improvement
> Reporter: Tim Brooks
> Assignee: Tim Brooks
> Attachments: KAFKA-2102.patch, KAFKA-2102_2015-04-08_00:20:33.patch
>
>
> Usage of the org.apache.kafka.clients.Metadata class is synchronized. It
> seems like the current functionality could be maintained without
> synchronizing the whole class.
> I have been working on improving this by moving to finer grained locks and
> using atomic operations. My initial benchmarking of the producer is that this
> will improve latency (using HDRHistogram) on submitting messages.
> I have produced an initial patch. I do not necessarily believe this is
> complete. And I want to definitely produce some more benchmarks. However, I
> wanted to get early feedback because this change could be deceptively tricky.
> I am interested in knowing if this is:
> 1. Something that is of interest to the maintainers/community.
> 2. Along the right track
> 3. If there are any gotchas that make my current approach naive.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)