Ottomata has submitted this change and it was merged.
Change subject: Experiment with different topic_request_required_acks settings
......................................................................
Experiment with different topic_request_required_acks settings
Occasionally the kafka <-> zookeeper connection times out for broker
analytics1021.
When this happens, we see some message loss, but varnishkafka reports that
all messages have been sent. It is possible that analytics1021 is dropping
messages
when it rejoins the ISR.
See Bug: 69667 for more information.
Change-Id: Id4deb19c4bfb720098109ba9e8e8f93020e0ad41
---
M manifests/role/cache.pp
1 file changed, 29 insertions(+), 0 deletions(-)
Approvals:
Ottomata: Looks good to me, approved
jenkins-bot: Verified
diff --git a/manifests/role/cache.pp b/manifests/role/cache.pp
index 6597ff4..ca0bc11 100644
--- a/manifests/role/cache.pp
+++ b/manifests/role/cache.pp
@@ -507,6 +507,34 @@
priority => 70,
}
+ # Trying out acks = -1 for select varnishes.
+ # We have always run with acks = 1, which means
+ # that only the leader of a partition needs to
+ # ACK a request for varnishkafka to consider it
+ # received by the Brokers. acks = -1 means
+ # that all Brokers in the partition's ISR must
+ # also ACK that they have received the produce
+ # request. This will mean lower latency, but
+ # less risk of losing messages due to broker
+ # problems. We also try with acks = 2,
+ # which would mean that at least 2 brokers
+ # would have to ack, rather than all of them.
+ # This is related to Bug 69667:
+ # https://bugzilla.wikimedia.org/show_bug.cgi?id=69667
+ #
+ # We will wait until the next time that analytics1021
+ # timesout from Zookeeper and examine the lost
+ # message count from the webrequest_sequence_stats
+ # table for these hosts, to see what the difference is.
+ $topic_request_required_acks = $::fqdn ? {
+ 'cp3019.esams.wikimedia.org' => '2', # esams bits
+ 'cp1056.eqiad.wmnet' => '2', # eqiad bits
+ 'cp1057.eqiad.wmnet' => '-1', # esams bits
+ 'cp3020.esams.wikimedia.org' => '-1', # eqiad bits
+ default => '1',
+ }
+
+
class { '::varnishkafka':
brokers => $kafka_brokers,
topic => $topic,
@@ -526,6 +554,7 @@
batch_num_messages => 6000,
# large timeout to account for potential cross DC latencies
topic_request_timeout_ms => 30000, # request ack timeout
+ topic_request_required_acks => $topic_request_required_acks,
# Write out stats to varnishkafka.stats.json
# this often. This is set at 15 so that
# stats will be fresh when polled from gmetad.
--
To view, visit https://gerrit.wikimedia.org/r/163744
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Id4deb19c4bfb720098109ba9e8e8f93020e0ad41
Gerrit-PatchSet: 4
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: QChris <[email protected]>
Gerrit-Reviewer: Ottomata <[email protected]>
Gerrit-Reviewer: jenkins-bot <>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits