Elukey has submitted this change and it was merged.

Change subject: Tune varnishkafka-webrequest parameters

Tune varnishkafka-webrequest parameters

The Analytics team discovered a lot of webrequests missing the end
datatime field ending up in data consistency errors.
Varnishlog has been used on various cp hosts with the following
configuration to spot anomalies:

sudo varnishlog -c -n frontend -L 5000 -T 1500
  -q 'VSL or (Timestamp:Start and not Timestamp:Resp)' | tee timeouts.txt

The VSL timeouts settings (-L and -T) are the same used by Varnishkafka.
This query asks for any request that is either logged with a VSL timeout
or with a Start timestamp but not a Resp one. Two things came up:

1) A lot of requests with the HttpGarbage tag are discarded by Varnish but
logged. Example:

*   << Request  >> 173757699
-   Begin          req 173757698 rxreq
-   Timestamp      Start: 1476449182.479356 0.000000 0.000000
-   Timestamp      Req: 1476449182.479356 0.000000 0.000000
-   BogoHeader     Header has ctrl char 0x1f
-   HttpGarbage    "GET%00"
-   ReqAcct        643 0 643 28 0 28
-   End

2) The VSL store overflow error is still present but happens less frequently.

The proposed solution for 1) is to avoid logging any request with
the HttpGarbage tag, and to raise the maximum number of incomplete requests
kept in memory to 10000.

Bug: T148412
Change-Id: I68ada5789a848a676989c08590819625740b6bd8
M modules/role/manifests/cache/kafka/webrequest.pp
1 file changed, 11 insertions(+), 6 deletions(-)

  Elukey: Looks good to me, approved
  Ottomata: Looks good to me, but someone else must approve
  Ema: Looks good to me, but someone else must approve
  jenkins-bot: Verified

diff --git a/modules/role/manifests/cache/kafka/webrequest.pp 
index 65fcdd7..f541861 100644
--- a/modules/role/manifests/cache/kafka/webrequest.pp
+++ b/modules/role/manifests/cache/kafka/webrequest.pp
@@ -15,10 +15,11 @@
     # Set varnish.arg.q or varnish.arg.m according to Varnish version
     if (hiera('varnish_version4', false)) {
-        # Background from T136314:
+        # Background task: T136314
+        # Background info about the parameters used:
         # 'q':
-        # Filter out PURGE requests and Pipe creation traffic.
-        # A Varnish log containing Timestamp:Pipe does not carry 
+        # 1) Filter out PURGE requests and Pipe creation traffic.
+        # 2) A Varnish log containing Timestamp:Pipe does not carry 
         # used by Analytics to bucket data on Hadoop and for data consistency
         # checks. These requests indicate that Varnish tried to establish a 
         # channel between the client and the backend, an information that
@@ -30,13 +31,15 @@
         # At the moment these requests get logged incorrectly and with partial
         # data (due to the VSL timeout) so it makes sense to filter them out to
         # remove noise from Analytics data.
+        # 3) A request marked with the VSL tag 'HttpGarbage' indicates 
+        # HTTP requests, generating spurious Varnish logs.
         # 'T':
         # VLS API timeout is the maximum time that Varnishkafka will wait 
         # "Begin" and "End" timestamps before flushing the available tags to a 
         # When a timeout occurs most of the times the result is a webrequest 
         # missing values like the end timestamp.
-        # Parameters modified during the upload migration:
+        # VSL Timeout parameters modified during the upload migration:
         # 'L':
         # Sets the upper limit of incomplete transactions kept before the 
         # one is force completed. This setting keeps an upper bound
@@ -44,14 +47,16 @@
         # A change in the -T timeout value has the side effect of keeping more
         # incomplete transactions in memory for each varnishkafka query (in 
our case
         # it directly corresponds to a varnishkafka instance running).
+        # The threshold has been raised to '5000' the first time (which removed
+        # the bulk of the timeouts) and to '10000' the second time.
         # 'T':
         # Raised the maximum timeout for incomplete records from '700' to 
         # after setting the -L to '5000'. VSL timeouts were masked
         # by VSL store overflow errors.
         $varnish_opts = {
-            'q' => 'ReqMethod ne "PURGE" and not Timestamp:Pipe and not 
ReqHeader:Upgrade ~ "[wW]ebsocket"',
+            'q' => 'ReqMethod ne "PURGE" and not Timestamp:Pipe and not 
ReqHeader:Upgrade ~ "[wW]ebsocket" and not HttpGarbage',
             'T' => '1500',
-            'L' => '5000'
+            'L' => '10000'
         $conf_template = 'varnishkafka/varnishkafka_v4.conf.erb'
     } else {

To view, visit https://gerrit.wikimedia.org/r/316306
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I68ada5789a848a676989c08590819625740b6bd8
Gerrit-PatchSet: 5
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Elukey <ltosc...@wikimedia.org>
Gerrit-Reviewer: BBlack <bbl...@wikimedia.org>
Gerrit-Reviewer: Elukey <ltosc...@wikimedia.org>
Gerrit-Reviewer: Ema <e...@wikimedia.org>
Gerrit-Reviewer: Ottomata <o...@wikimedia.org>
Gerrit-Reviewer: jenkins-bot <>

MediaWiki-commits mailing list

Reply via email to