Faidon Liambotis has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/243885

Change subject: Revert Varnish X-Analytics and netmapper changes
......................................................................

Revert Varnish X-Analytics and netmapper changes

Suspected to be involved into a much higer amount of backend traffic
(both to appservers and to Swift), ultimately resulting in an
imagescaler outage due to the high load.

There is a time correlation between the two events and Varnish Ganglia
metrics show a definite increase of various metrics such as struct sess
(on both backend and frontend). No other evidence yet.

This reverts commits:
 - a47a985fbafcce708c9d0c896dad05e4106d9d2f
 - d200332d2a8df72f5895aa7bdb8e4ea7f73e6412
 - e73eab0d80eeca86a764b8e689a8cba2e943edea
 - 941b3e6e97f0b09954f6eb218b358f237196ebdb
 - f092ee39eb89caf71fd34bd3a1f49e89b7a07e71

Change-Id: I6e4d784c28f38b227bc2661278b80e22d35dc98c
---
M modules/varnish/manifests/common/vcl.pp
M modules/varnish/templates/vcl/wikimedia.vcl.erb
D templates/varnish/analytics.inc.vcl.erb
A templates/varnish/last-access.inc.vcl.erb
M templates/varnish/mobile-frontend.inc.vcl.erb
A templates/varnish/provenance.inc.vcl.erb
M templates/varnish/text-frontend.inc.vcl.erb
A templates/varnish/via.inc.vcl.erb
M templates/varnish/zero.inc.vcl.erb
9 files changed, 373 insertions(+), 363 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/85/243885/1

diff --git a/modules/varnish/manifests/common/vcl.pp 
b/modules/varnish/manifests/common/vcl.pp
index 2ace903..41f63bd 100644
--- a/modules/varnish/manifests/common/vcl.pp
+++ b/modules/varnish/manifests/common/vcl.pp
@@ -9,11 +9,17 @@
     }
 
     file { '/etc/varnish/last-access.inc.vcl':
-        ensure => absent,
+        owner   => 'root',
+        group   => 'root',
+        mode    => '0444',
+        content => template('varnish/last-access.inc.vcl.erb'),
     }
 
     file { '/etc/varnish/provenance.inc.vcl':
-        ensure => absent,
+        owner   => 'root',
+        group   => 'root',
+        mode    => '0444',
+        content => template('varnish/provenance.inc.vcl.erb'),
     }
 
     file { '/etc/varnish/device-detection.inc.vcl':
@@ -27,8 +33,12 @@
         content => template('varnish/errorpage.inc.vcl.erb'),
     }
 
+
     file { '/etc/varnish/via.inc.vcl':
-        ensure => absent,
+        owner   => 'root',
+        group   => 'root',
+        mode    => '0444',
+        content => template('varnish/via.inc.vcl.erb'),
     }
 
     file { '/etc/varnish/hhvm.inc.vcl':
@@ -36,13 +46,6 @@
         owner  => 'root',
         group  => 'root',
         mode   => '0444',
-    }
-
-    file { '/etc/varnish/analytics.inc.vcl':
-        owner   => 'root',
-        group   => 'root',
-        mode    => '0444',
-        content => template('varnish/analytics.inc.vcl.erb'),
     }
 
     # VCL unit tests
diff --git a/modules/varnish/templates/vcl/wikimedia.vcl.erb 
b/modules/varnish/templates/vcl/wikimedia.vcl.erb
index c52f4c2..1257de3 100644
--- a/modules/varnish/templates/vcl/wikimedia.vcl.erb
+++ b/modules/varnish/templates/vcl/wikimedia.vcl.erb
@@ -7,11 +7,6 @@
 #   being set from different code.
 import header;
 
-<% if @vcl_config.fetch("layer", "") == "frontend" -%>
-# only used in recv_fe_ip_processing on frontends
-import netmapper;
-<% end %>
-
 <%
 def backend_option(backend, option, default)
        if @varnish_backend_options.kind_of?(Array)
@@ -47,11 +42,6 @@
 
 <% if @vcl_config.fetch( "enable_geoiplookup", false ) -%>
 include "geoip.inc.vcl";
-<% end -%>
-
-<% if @vcl_config.fetch("layer", "") == "frontend" -%>
-// defines analytics_(recv|deliver) subs
-include "analytics.inc.vcl";
 <% end -%>
 
 # ACLs
@@ -368,143 +358,30 @@
        }
 }
 
-<% if @vcl_config.fetch("layer", "") == "frontend" -%>
-// Must be done at the top of vcl_recv, in our varnish-frontend layer only,
-// and should be guarded against running on request restarts.
-sub recv_fe_ip_processing {
-       // this subroutine "owns" these 4 headers - nothing else in our VCL or
-       // anywhere in our network should be setting them.
-       unset req.http.X-Trusted-Proxy;
-       unset req.http.X-Client-IP;
-       unset req.http.X-Carrier;
-       unset req.http.X-Carrier-Meta;
-
-       if (client.ip !~ wikimedia_nets) {
-               // Ensure we only accept XFP from our own networks.  Ideally
-               // it should only be set by our nginx TLS terminator
-               // specifically, but there are known cases where internal apps
-               // set XFP to fake HTTPS when making a request to our public
-               // endpoints from the inside.
-               unset req.http.X-Forwarded-Proto;
-               // only the nginx TLS terminator should be setting this one at
-               // all - there are no other internal exceptions to that rule
-               unset req.http.X-Real-IP;
-       }
-
-       // There are two possible cases here: either nginx acted as our TLS
-       // proxy and already set X-Real-IP (as well as appended the same value
-       // as XFF), or the traffic was direct to varnish-fe, in which case
-       // XRIP is not yet set and XFF is directly from external.
-       if (!req.http.X-Real-IP) {
-               // direct-to-port-80 case, set XRIP ourselves
-               set req.http.X-Real-IP = client.ip;
-               set req.http.X-Trusted-Proxy = netmapper.map("proxies", 
req.http.X-Real-IP);
-               // normalize to boolean post-netmapper (varnish-3.0.4...)
-               if (req.http.X-Trusted-Proxy == "") {
-                       unset req.http.X-Trusted-Proxy;
-               }
-               if (req.http.X-Trusted-Proxy && req.http.X-Forwarded-For) {
-                       // get last from trusted-proxy-supplied XFF
-                       set req.http.X-Client-IP = 
regsub(req.http.X-Forwarded-For, "^([^,]+, ?)+", "");
-               }
-       } else {
-               // XRIP from nginx, XFF assumed to be set and contain at least
-               // XRIP at the end, possibly prepended by other addrs set
-               // externally by some proxy.
-               set req.http.X-Trusted-Proxy = netmapper.map("proxies", 
req.http.X-Real-IP);
-               // normalize to boolean post-netmapper (varnish-3.0.4...)
-               if (req.http.X-Trusted-Proxy == "") {
-                       unset req.http.X-Trusted-Proxy;
-               }
-               if (req.http.X-Trusted-Proxy) {
-                       // strip off final XFF entry (should alias XRIP, both 
set
-                       // by nginx) to set xffce, then take the final entry
-                       // of xffce as the client IP.  Note that if the
-                       // trusted proxy did not set an XFF header nginx would
-                       // have set it to just XRIP with no commas, and the
-                       // regsubs would result in setting XCIP = XRIP.
-                       set req.http.xffce = regsub(req.http.X-Forwarded-For, 
",[^,]+$", "");
-                       set req.http.X-Client-IP = regsub(req.http.xffce, 
"^([^,]+, ?)+", "");
-                       unset req.http.xffce; // clean up temp var
-               }
-       }
-
-       // If XCIP not set above, or looks invalid due to bad external XFF
-       // input, just alias to XRIP
-       if (!req.http.X-Client-IP || req.http.X-Client-IP !~ 
"^[0-9a-fA-F:.]+$") {
-               set req.http.X-Client-IP = req.http.X-Real-IP;
-       }
-
-       // Now check carrier database for setting X-Carrier based on XCIP
-       set req.http.X-Carrier = netmapper.map("carriers", 
req.http.X-Client-IP);
-       // normalize to boolean post-netmapper (varnish-3.0.4...)
-       if (req.http.X-Carrier == "") {
-               unset req.http.X-Carrier;
-       }
-       else {
-               // Split X-Carrier data from raw form with optional trailing 
metadata,
-               // such as "123-45|wap|mobile", so that X-Carrier contains only
-               // MCC-MNC and X-Carrier-Meta contains the trailing attributes
-               set req.http.X-Carrier-Meta = regsub(req.http.X-Carrier, 
"^[^|]*\|?", "");
-               if (req.http.X-Carrier-Meta != "") {
-                       set req.http.X-Carrier = regsub(req.http.X-Carrier, 
"\|.*$", "");
-               }
-               else {
-                       unset req.http.X-Carrier-Meta;
-               }
-       }
-
-       // From this (very early) point forward, regardless of cache tier/layer:
-       // client.ip   ->
-       //     the network-level source address, hop-by-hop - could be an
-       //     internal address within our infrastructure as we traverse
-       //     various cache/proxy layers.
-       // req.http.X-Real-IP   ->
-       //     the network-level source address when this request first
-       //     entered our public traffic infrastructure at the edge, with no
-       //     other decoding.  Could still be a trusted external proxy.
-       // req.http.X-Trusted-Proxy ->
-       //     If X-Real-IP mapped to the address of a trusted proxy in our
-       //     "proxies" database (such as OperaMini), this will be the
-       //     official name of the trusted proxy.  Otherwise it will be unset
-       //     (boolean false).
-       // req.http.X-Client-IP ->
-       //     Iff XTP above is set, *and* the trusted-proxy-supplied X-F-F
-       //     had a legitimate-looking address at the end, X-Client-IP will
-       //     contain the claimed client IP directly behind the trusted
-       //     proxy.  Otherwise this will alias X-Real-IP.
-       // req.http.X-Carrier ->
-       //     If X-Client-IP matches a network in our "carriers" database,
-       //     this will contain the MCC-MNC code for that carrier.  Otherwise
-       //     it will be undefined.
-       // req.http.X-Carrier-Meta ->
-       //     If X-Carrier is defined: for some carriers, the database
-       //     contains extra metadata in the form of one or more labels like
-       //     "wap" or "residential".  They'll be separated by "|" if more
-       //     than one, and this header is undefined if there was no such
-       //     metadata.
-}
-
-<% end %>
-
-sub vcl_init {
-<% if @vcl_config.fetch("layer", "") == "frontend" -%>
-       // again, netmapper only used in frontends, for recv_fe_ip_processing
-       // args here are map-name (for .map()), data file, and seconds between 
mtime checks for reload
-       netmapper.init("proxies", "/var/netmapper/proxies.json", 89);
-       netmapper.init("carriers", "/var/netmapper/carriers.json", 89);
-<% end %>
-}
-
-sub vcl_recv {
-       // IP processing is req->req mangling that shouldn't be re-done on
-       // restart, and XFF-appending is non-idempotent for restart purposes..
+// XFF / XFP / XRIP -handling (order is important below, should be at start of 
recv)
+// TODO: Integration with zero-like XFF-handling, to skip over authorized 
proxies for XRIP
+sub recv_ip_processing {
+       // Everything here is req-mangling based on the req
+       // itself, and some parts are non-idempotent, so do not
+       // re-run this on a request restart
        if (req.restarts == 0) {
-<% if @vcl_config.fetch("layer", "") == "frontend" -%>
-               call recv_fe_ip_processing;
-<% end %>
-               // All layers need to update XFF with client.ip hop-by-hop so 
that it
-               // looks right to layers beneath, including the app layer
+               if (client.ip !~ wikimedia_nets) {
+                       // Ensure we only accept XFP and XRIP from our nets.
+                       // Note this *should* be just the local tlsproxy
+                       // terminator, or passing through layers of varnish,
+                       // but we have known cases where applayer internal
+                       // things are faking XFP...
+                       unset req.http.X-Forwarded-Proto;
+                       unset req.http.X-Real-IP;
+               }
+
+               // Ensure XRIP is always set, in the case of varnish-fe and a
+               // direct external request (otherwise nginx will have set it).
+               if (!req.http.X-Real-IP) {
+                       set req.http.X-Real-IP = client.ip;
+               }
+
+               // All proxies need to update XFF with client.ip hop-by-hop
                if (req.http.X-Forwarded-For) {
                        set req.http.X-Forwarded-For = req.http.X-Forwarded-For 
+ ", " + client.ip;
                } else {
@@ -512,7 +389,18 @@
                }
        }
 
-<% if @vcl_config.fetch("layer", "") != "frontend" -%>
+       // From this (very early) point forward:
+       // client.ip -> the network-level source address, hop-by-hop
+       // X-Real-IP -> the network-level source address when this
+       // request first entered our infrastructure at the edge, without any
+       // regard to external XFF header processing, and does not become
+       // "wrong" from traversing our internal proxy layers.
+}
+
+sub vcl_recv {
+       call recv_ip_processing;
+
+<% if @vcl_config.fetch("layer", "frontend") != "frontend" -%>
        if (client.ip !~ wikimedia_nets) {
                // Do not allow direct access to non-frontend layers
                error 403 "Access denied";
@@ -565,11 +453,6 @@
        }
 <% end -%>
 
-<% if @vcl_config.fetch("layer", "") == "frontend" -%>
-       if(req.restarts == 0) {
-               call analytics_recv;
-       }
-<% end -%>
        /* Function vcl_recv in <%= @vcl %>.inc.vcl will be appended here */
 }
 
@@ -653,10 +536,6 @@
 
 <% if @vcl_config.fetch("layer", "") == "frontend" && 
@vcl_config.fetch("https_redirects", false) -%>
        call https_deliver_hsts;
-<% end -%>
-
-<% if @vcl_config.fetch("layer", "") == "frontend" -%>
-       call analytics_deliver;
 <% end -%>
 
        /* Function vcl_deliver in <%= @vcl %>.inc.vcl will be appended here */
diff --git a/templates/varnish/analytics.inc.vcl.erb 
b/templates/varnish/analytics.inc.vcl.erb
deleted file mode 100644
index a280e7c..0000000
--- a/templates/varnish/analytics.inc.vcl.erb
+++ /dev/null
@@ -1,181 +0,0 @@
-/*****************************************************************************
- * Varnish VCL for WMF-Last-Access Cookie
- * Please see what this cookie is trying to acomplish:
- * 
https://wikitech.wikimedia.org/wiki/Analytics/Unique_clients/Last_visit_solution
- *
- * General notes on timestamp format strings used here:
- * "now" stringifies as "Wed, 01 Jan 2000 01:01:01 GMT", which is the same
- * format used by Set-Cookie "Expires" data.  The format for the last access
- * value, and thus X-NowDay and X-WMF-LastStamp as well, is "01-Jan-2000"
- * (because the other info is redundant or too-specific, and cookie values
- * shouldn't have whitespace or commas).
- ****************************************************************************/
-
-/*****************************************************************************
- * This must be called *before* any vcl_recv cookie munging.  It more-properly
- * belongs in _deliver, but putting it here avoids all of the issues
- * surrounding consistent access to Cookie vs X-Orig-Cookie at vcl_deliver
- * time in the text/mobile cases.
- * It does so at the cost of sending a pointless and unintended
- * "X-WMF-LastStamp: 01-Jan-2000" header to the application layer as well on
- * cache miss/bypass.
- * Note we don't validate that the cookie's 3-letter month abbreviation is
- * legal, or that the numeric values for the date/year are legal, just that
- * they have the right count of the right kinds of characters.
- ****************************************************************************/
-sub analytics_last_access_recv {
-    unset req.http.X-WMF-LastStamp; // clear any sent by the user
-    if (req.http.Cookie ~ 
"(^|;\s*)WMF-Last-Access=[0-9]{2}-[A-Za-z]{3}-[0-9]{4}(;|$)") {
-        // Save the value for use later in _deliver
-        set req.http.X-WMF-LastStamp = regsub(
-            req.http.Cookie,
-            "^(?:.*;\s*)?WMF-Last-Access=([^;]+).*$",
-            "\1"
-        );
-    }
-}
-
-/*****************************************************************************
- * !!! private to analytics_last_access_deliver !!!!
- * This should be:
- *     header.append(resp.http.Set-Cookie,
- *         "WMF-Last-Access="
- *         + req.http.X-NowDay
- *         + ";Path=/;HttpOnly;Expires="
- *         + (now + 32d)
- *     );
- * However, varnish3 is buggy wrt str + (time + duration), so we're forced to
- * drop to inline C a bit here and do what the VCL compiler should have done
- * for us above.  On top of all that, the C code now floors the expiry to the
- * next-lower 12 hour mark, which would've been a bit trickier in VCL...
- ****************************************************************************/
-C{#include <time.h>}C
-sub set_last_access_cookie__ { C{
-    Vmod_Func_header.append(sp, HDR_RESP, "\013Set-Cookie:",
-        "WMF-Last-Access=",
-        VRT_GetHdr(sp, HDR_REQ, "\011X-NowDay:"),
-        ";Path=/;HttpOnly;Expires=",
-        VRT_time_string(sp, (double)(
-            ((time_t)VRT_r_now(sp) + 2764800) / 43200 * 43200
-        )),
-        vrt_magic_string_end
-    );
-}C }
-
-// Call from vcl_deliver near other X-Analytics code
-sub analytics_last_access_deliver {
-    // Create X-NowDay in "01-Jan-2000" form, from "now"
-    set req.http.X-NowDay = regsub(
-        now, "^..., (..) (...) (....) .*$", "\1-\2-\3"
-    );
-
-    if(req.http.X-WMF-LastStamp) {
-        set resp.http.X-Analytics = resp.http.X-Analytics
-            + ";WMF-Last-Access="
-            + req.http.X-WMF-LastStamp;
-
-        // re-set the cookie if it's not from today
-        if (req.http.X-NowDay != req.http.X-WMF-LastStamp) {
-            call set_last_access_cookie__;
-        }
-
-    }
-    else {
-        // sets the initial cookie if no valid one existed
-        call set_last_access_cookie__;
-    }
-
-    // we could clean up req.http.X-WMF-LastStamp + req.http.X-NowDay
-    // here, but they're not being sent anywhere (else) at this point
-    // anyways, so why bother?
-}
-
-// Analytics for analytics "wprov" Provenance data
-// See https://www.mediawiki.org/wiki/Provenance for reserved values.
-
-sub analytics_provenance_recv {
-    // Avoid cache fragmentation for well-formed provenance parameters
-    // Refer to discussion starting from
-    // 
https://lists.wikimedia.org/pipermail/analytics/2015-February/003426.html
-    // Look for wprov parameter with a value
-    if (req.url ~ "(?i)[?&]wprov=[^&]+") {
-        // Ready a variable for later X-Analytics tagging in vcl_deliver.
-
-        // Grab just the value of the wprov parameter, excluding the rest of 
the URL
-        set req.http.X-WMF-WPROV = regsub(req.url, 
"(?i).+[?&]wprov=([^&]+).*", "\1");
-
-        // Remove the wprov=X parameter from req.url to avoid cache
-        // fragmentation using two regexes to cover distinct cases:
-
-        // (1) Simple strip if final query arg:
-        set req.url = regsub(req.url, "(?i)[?&]wprov=[^&]+$", "");
-
-        // (2) When not the final arg, we need to capture the leading
-        //     [?&] to reuse with the parameter that follows:
-        set req.url = regsub(req.url, "(?i)([?&])wprov=[^&]+&", "\1");
-    }
-}
-
-sub analytics_provenance_deliver {
-    // In case there was a provenance parameter with a value, add it to 
X-Analytics
-    if (req.http.X-WMF-WPROV) {
-        set resp.http.X-Analytics = resp.http.X-Analytics + ";wprov=" + 
req.http.X-WMF-WPROV;
-    }
-}
-
-// Combined analytics recv and deliver hooks, to be included directly in 
vcl_recv and vcl_deliver
-sub analytics_recv {
-    call analytics_last_access_recv;
-    call analytics_provenance_recv;
-}
-
-sub analytics_deliver {
-    // Create empty header if none, to avoid tons of if/else clauses; will
-    // clean up at the end.  Note that if we defined one of the k=v pairs as
-    // required (having a real value for the false/negative case), we could
-    // set that one first and this would get a bit cleaner...
-    if (!resp.http.X-Analytics) {
-        set resp.http.X-Analytics = "";
-    }
-
-    call analytics_last_access_deliver;
-    call analytics_provenance_deliver;
-
-    if (req.http.X-Carrier) {
-        set resp.http.X-Analytics = resp.http.X-Analytics + ";zero=" + 
req.http.X-Carrier;
-        if (req.http.X-Carrier-Meta) {
-            set resp.http.X-Analytics = resp.http.X-Analytics + ";zeronet=" + 
req.http.X-Carrier-Meta;
-        }
-    }
-
-    if (req.http.X-Trusted-Proxy) {
-        set resp.http.X-Analytics = resp.http.X-Analytics + ";proxy=" + 
req.http.X-Trusted-Proxy;
-    }
-
-    if (req.http.X-Forwarded-Proto) {
-        set resp.http.X-Analytics = resp.http.X-Analytics + ";https=1";
-    }
-
-    if (req.http.X-WMF-UUID) {
-        set resp.http.X-Analytics = resp.http.X-Analytics + ";wmfuuid=" + 
req.http.X-WMF-UUID;
-    }
-
-    // Add proxy=IORG X-Analytics tag if appropriate.
-    // Although Via: Internet.org usually comes via proxying, it isn't 
guaranteed to come that way.
-    // Nonetheless, as it is tagged with Via and the equipment is under 
Internet.org, we proxy tag.
-    // Note, Internet.org is believed to apply to all Wikimedia sites, so this 
code should run not
-    // just for (m|zero).wikipedia.org and subdomains. Hence the inclusion of 
this file by both
-    // mobile-frontend.inc.vcl.erb and text-frontend.inc.vcl.erb, as opposed 
to a one-off in
-    // zero.inc.vcl.erb alone. See the notes at the top of 
mobile-frontend.inc.vcl.erb and
-    // text-frontend.inc.vcl.erb for more context.
-    if (req.http.Via ~ "(?i)Internet\.org") {
-        set resp.http.X-Analytics = resp.http.X-Analytics + ";proxy=IORG";
-    }
-
-    // Clean up header from setting to empty at the start...
-    if (resp.http.X-Analytics == "") {
-        unset resp.http.X-Analytics;
-    } else {
-        set resp.http.X-Analytics = regsub(resp.http.X-Analytics, "^;", "");
-    }
-}
diff --git a/templates/varnish/last-access.inc.vcl.erb 
b/templates/varnish/last-access.inc.vcl.erb
new file mode 100644
index 0000000..bdf4a76
--- /dev/null
+++ b/templates/varnish/last-access.inc.vcl.erb
@@ -0,0 +1,98 @@
+/*****************************************************************************
+ * Varnish VCL include file for WMF-Last-Access Cookie
+ * Please see what this cookie is trying to acomplish:
+ * 
https://wikitech.wikimedia.org/wiki/Analytics/Unique_clients/Last_visit_solution
+ *
+ * General notes on timestamp format strings used here:
+ * "now" stringifies as "Wed, 01 Jan 2000 01:01:01 GMT", which is the same
+ * format used by Set-Cookie "Expires" data.  The format for the last access
+ * value, and thus X-NowDay and X-WMF-LastStamp as well, is "01-Jan-2000"
+ * (because the other info is redundant or too-specific, and cookie values
+ * shouldn't have whitespace or commas).
+ ****************************************************************************/
+
+/*****************************************************************************
+ * This must be called *before* any vcl_recv cookie munging.  It more-properly
+ * belongs in _deliver, but putting it here avoids all of the issues
+ * surrounding consistent access to Cookie vs X-Orig-Cookie at vcl_deliver
+ * time in the text/mobile cases.
+ * It does so at the cost of sending a pointless and unintended
+ * "X-WMF-LastStamp: 01-Jan-2000" header to the application layer as well on
+ * cache miss/bypass.
+ * Note we don't validate that the cookie's 3-letter month abbreviation is
+ * legal, or that the numeric values for the date/year are legal, just that
+ * they have the right count of the right kinds of characters.
+ ****************************************************************************/
+sub analytics_last_access_recv {
+       if (req.restarts == 0) {
+               unset req.http.X-WMF-LastStamp; // clear any sent by the user
+               if (req.http.Cookie ~ 
"(^|;\s*)WMF-Last-Access=[0-9]{2}-[A-Za-z]{3}-[0-9]{4}(;|$)") {
+                       // Save the value for use later in _deliver
+                       set req.http.X-WMF-LastStamp = regsub(
+                               req.http.Cookie,
+                               "^(?:.*;\s*)?WMF-Last-Access=([^;]+).*$",
+                               "\1"
+                       );
+               }
+       }
+}
+
+/*****************************************************************************
+ * !!! private to analytics_last_access_deliver !!!!
+ * This should be:
+ *     header.append(resp.http.Set-Cookie,
+ *             "WMF-Last-Access="
+ *             + req.http.X-NowDay
+ *             + ";Path=/;HttpOnly;Expires="
+ *             + (now + 32d)
+ *     );
+ * However, varnish3 is buggy wrt str + (time + duration), so we're forced to
+ * drop to inline C a bit here and do what the VCL compiler should have done
+ * for us above.  On top of all that, the C code now floors the expiry to the
+ * next-lower 12 hour mark, which would've been a bit trickier in VCL...
+ ****************************************************************************/
+C{#include <time.h>}C
+sub set_last_access_cookie__ { C{
+       Vmod_Func_header.append(sp, HDR_RESP, "\013Set-Cookie:",
+               "WMF-Last-Access=",
+               VRT_GetHdr(sp, HDR_REQ, "\011X-NowDay:"),
+               ";Path=/;HttpOnly;Expires=",
+               VRT_time_string(sp, (double)(
+                       ((time_t)VRT_r_now(sp) + 2764800) / 43200 * 43200
+               )),
+               vrt_magic_string_end
+       );
+}C }
+
+// Call from vcl_deliver near other X-Analytics code
+sub analytics_last_access_deliver {
+       // Create X-NowDay in "01-Jan-2000" form, from "now"
+       set req.http.X-NowDay = regsub(
+               now, "^..., (..) (...) (....) .*$", "\1-\2-\3"
+       );
+
+       if(req.http.X-WMF-LastStamp) {
+               if (resp.http.X-Analytics) {
+                       set resp.http.X-Analytics = resp.http.X-Analytics
+                               + ";WMF-Last-Access="
+                               + req.http.X-WMF-LastStamp;
+               } else {
+                       set resp.http.X-Analytics = "WMF-Last-Access="
+                               + req.http.X-WMF-LastStamp;
+               }
+
+               // re-set the cookie if it's not from today
+               if (req.http.X-NowDay != req.http.X-WMF-LastStamp) {
+                       call set_last_access_cookie__;
+               }
+
+       }
+       else {
+               // sets the initial cookie if no valid one existed
+               call set_last_access_cookie__;
+       }
+
+       // we could clean up req.http.X-WMF-LastStamp + req.http.X-NowDay
+       // here, but they're not being sent anywhere (else) at this point
+       // anyways, so why bother?
+}
diff --git a/templates/varnish/mobile-frontend.inc.vcl.erb 
b/templates/varnish/mobile-frontend.inc.vcl.erb
index 0944840..88a0822 100644
--- a/templates/varnish/mobile-frontend.inc.vcl.erb
+++ b/templates/varnish/mobile-frontend.inc.vcl.erb
@@ -3,8 +3,11 @@
 include "errorpage.inc.vcl";
 include "text-common.inc.vcl";
 include "zero.inc.vcl";
+include "provenance.inc.vcl";
+include "via.inc.vcl";
+include "last-access.inc.vcl";
 
-// Note that analytics.inc.vcl will set an X-Analytics value of proxy=IORG
+// Note that via.inc.vcl will set an X-Analytics value of proxy=IORG
 // without inspecting whether there's an existing proxy=<proxy> key-
 // value pair inside X-Analytics. We do this because if the traffic
 // had come from a known proxy (e.g., Opera or Nokia), that would
@@ -90,6 +93,7 @@
                unset req.http.If-Modified-Since;
        }
 
+       call analytics_last_access_recv;
        call evaluate_cookie_mobile;
        call pass_authorization;
        return (lookup);
@@ -144,4 +148,25 @@
                call geoip_cookie;
        }
 <% end -%>
+
+       // Assemble X-Analytics header
+       // Some of the headers used for X-Analytics are not varied on, so add 
them after the backend processing
+       // Note that vcl_deliver in other files may also modify X-Analytics.
+       if (req.http.X-Forwarded-Proto) {
+               if (resp.http.X-Analytics) {
+                       set resp.http.X-Analytics = resp.http.X-Analytics + 
";https=1";
+               } else {
+                       set resp.http.X-Analytics = "https=1";
+               }
+       }
+
+       if (req.http.X-WMF-UUID) {
+               if (resp.http.X-Analytics) {
+                       set resp.http.X-Analytics = resp.http.X-Analytics + 
";wmfuuid=" + req.http.X-WMF-UUID;
+               } else {
+                       set resp.http.X-Analytics = "wmfuuid=" + 
req.http.X-WMF-UUID;
+               }
+       }
+
+       call analytics_last_access_deliver;
 }
diff --git a/templates/varnish/provenance.inc.vcl.erb 
b/templates/varnish/provenance.inc.vcl.erb
new file mode 100644
index 0000000..852924b
--- /dev/null
+++ b/templates/varnish/provenance.inc.vcl.erb
@@ -0,0 +1,34 @@
+sub vcl_recv {
+       // Avoid cache fragmentation for well-formed provenance parameters
+       // Refer to discussion starting from
+       // 
https://lists.wikimedia.org/pipermail/analytics/2015-February/003426.html
+       // Look for wprov parameter with a value
+       if (req.url ~ "(?i)[?&]wprov=[^&]+") {
+               // Ready a variable for later X-Analytics tagging in 
vcl_deliver.
+               // See https://www.mediawiki.org/wiki/Provenance for reserved 
values.
+
+               // Grab just the value of the wprov parameter, excluding the 
rest of the URL
+               set req.http.X-WMF-WPROV = regsub(req.url, 
"(?i).+[?&]wprov=([^&]+).*", "\1");
+
+               // Remove the wprov=X parameter from req.url to avoid cache
+               // fragmentation using two regexes to cover distinct cases:
+
+               // (1) Simple strip if final query arg:
+               set req.url = regsub(req.url, "(?i)[?&]wprov=[^&]+$", "");
+
+               // (2) When not the final arg, we need to capture the leading
+               //     [?&] to reuse with the parameter that follows:
+               set req.url = regsub(req.url, "(?i)([?&])wprov=[^&]+&", "\1");
+       }
+}
+
+sub vcl_deliver {
+       // In case there was a provenance parameter with a value, add it to 
X-Analytics
+       if (req.http.X-WMF-WPROV) {
+               if (resp.http.X-Analytics) {
+                       set resp.http.X-Analytics = resp.http.X-Analytics + 
";wprov=" + req.http.X-WMF-WPROV;
+               } else {
+                       set resp.http.X-Analytics = "wprov=" + 
req.http.X-WMF-WPROV;
+               }
+       }
+}
diff --git a/templates/varnish/text-frontend.inc.vcl.erb 
b/templates/varnish/text-frontend.inc.vcl.erb
index 955f272..161cc54 100644
--- a/templates/varnish/text-frontend.inc.vcl.erb
+++ b/templates/varnish/text-frontend.inc.vcl.erb
@@ -2,8 +2,11 @@
 
 include "errorpage.inc.vcl";
 include "text-common.inc.vcl";
+include "provenance.inc.vcl";
+include "via.inc.vcl";
+include "last-access.inc.vcl";
 
-// Note that analytics.inc.vcl will set an X-Analytics value of proxy=IORG
+// Note that via.inc.vcl will set an X-Analytics value of proxy=IORG
 // without inspecting whether there's an existing proxy=<proxy> key-
 // value pair inside X-Analytics. We do this because if the traffic
 // had come from a known proxy (e.g., Opera or Nokia), that would
@@ -100,6 +103,7 @@
                unset req.http.If-Modified-Since;
        }
 
+       call analytics_last_access_recv;
        call evaluate_cookie;
        call pass_authorization;
        return (lookup);
@@ -164,4 +168,25 @@
                call geoip_cookie;
        }
 <% end -%>
+
+       // Assemble X-Analytics header
+       // Some of the headers used for X-Analytics are not varied on, so add 
them after the backend processing
+       // Note that vcl_deliver in other files may also modify X-Analytics.
+       if (req.http.X-Forwarded-Proto) {
+               if (resp.http.X-Analytics) {
+                       set resp.http.X-Analytics = resp.http.X-Analytics + 
";https=1";
+               } else {
+                       set resp.http.X-Analytics = "https=1";
+               }
+       }
+
+       if (req.http.X-WMF-UUID) {
+               if (resp.http.X-Analytics) {
+                       set resp.http.X-Analytics = resp.http.X-Analytics + 
";wmfuuid=" + req.http.X-WMF-UUID;
+               } else {
+                       set resp.http.X-Analytics = "wmfuuid=" + 
req.http.X-WMF-UUID;
+               }
+       }
+
+       call analytics_last_access_deliver;
 }
diff --git a/templates/varnish/via.inc.vcl.erb 
b/templates/varnish/via.inc.vcl.erb
new file mode 100644
index 0000000..1736d7e
--- /dev/null
+++ b/templates/varnish/via.inc.vcl.erb
@@ -0,0 +1,18 @@
+sub vcl_deliver {
+       // Add proxy=IORG X-Analytics tag if appropriate.
+       // Although Via: Internet.org usually comes via proxying, it isn't 
guaranteed to come that way.
+       // Nonetheless, as it is tagged with Via and the equipment is under 
Internet.org, we proxy tag.
+       // Note, Internet.org is believed to apply to all Wikimedia sites, so 
this code should run not
+       // just for (m|zero).wikipedia.org and subdomains. Hence the inclusion 
of this file by both
+       // mobile-frontend.inc.vcl.erb and text-frontend.inc.vcl.erb, as 
opposed to a one-off in
+       // zero.inc.vcl.erb alone. See the notes at the top of 
mobile-frontend.inc.vcl.erb and
+       // text-frontend.inc.vcl.erb for more context.
+       if (req.http.Via ~ "(?i)Internet\.org") {
+               if (resp.http.X-Analytics) {
+                       set resp.http.X-Analytics = resp.http.X-Analytics + 
";proxy=IORG";
+               } else {
+                       set resp.http.X-Analytics = "proxy=IORG";
+               }
+       }
+}
+
diff --git a/templates/varnish/zero.inc.vcl.erb 
b/templates/varnish/zero.inc.vcl.erb
index 8d26ed7f..2e005ab 100644
--- a/templates/varnish/zero.inc.vcl.erb
+++ b/templates/varnish/zero.inc.vcl.erb
@@ -2,19 +2,73 @@
 
 // Note: This requires "import header" in the including VCL
 
+import netmapper;
+
+sub vcl_init {
+    // args here are map-name (for .map()), data file, and seconds between 
mtime checks for reload
+    netmapper.init("carriers", "/var/netmapper/carriers.json", 89);
+    netmapper.init("proxies", "/var/netmapper/proxies.json", 89);
+}
+
 sub tag_carrier {
-    // XTP and XC are set up in the common wikimedia.vcl now from netmapper 
data
+    // This is what things look like on entry in four basic scenarios
+    // ("both" here means something MITM-proxied the SSL connection, like 
OperaMini,
+    //   but was kind enough to set an XFF header on the way)
+    // (the leading "..." in XFF is that sometimes there are other local
+    //   proxies on 127.0.0.1 or whatever, according to some OperaMini docs...)
 
-    // Use standard XTP to set the tag_carrier-specific XFB header...
-    set req.http.X-Forwarded-By = req.http.X-Trusted-Proxy;
+    // *********************************************************
+    // *    v> | client.ip | XFF                        | XFP
+    // * direct| client    | (...,)? client             |
+    // * ssl   | ssl       | (...,)? client, ssl        | https
+    // * proxy | proxy     | (...,)? client, proxy      |
+    // * both  | ssl       | (...,)? client, proxy, ssl | https
+    // *********************************************************
 
-    // Use standard XC/XCM to set the tag_carrier-specific X-CS2 header...
-    if (req.http.X-Carrier-Meta) {
-        set req.http.X-CS2 = req.http.X-Carrier + "|" + 
req.http.X-Carrier-Meta;
+    // first, strip the SSL entry from XFF if applicable
+    if (req.http.X-Forwarded-Proto) {
+        set req.http.XFF-NoSSL = regsub(req.http.X-Forwarded-For, ",[^,]+$", 
""); // strips final entry
+    } else {
+        set req.http.XFF-NoSSL = req.http.X-Forwarded-For;
     }
-    else {
-        set req.http.X-CS2 = req.http.X-Carrier;
+
+    // 
*****************************************************************************
+    // *    v>      | XFF-NoSSL             | map("proxies") input from regsub 
below
+    // * direct/ssl | (...,)? client        | client
+    // * proxy/both | (...,)? client, proxy | proxy
+    // 
*****************************************************************************
+
+    // now get the trusted proxy into XFB, if any - the regsub grabs the final 
entry in the list
+    set req.http.X-Forwarded-By = netmapper.map("proxies", 
regsub(req.http.XFF-NoSSL, "^([^,]+, ?)+", ""));
+
+    // normalize to boolean again...
+    if (req.http.X-Forwarded-By == "") {
+        unset req.http.X-Forwarded-By;
     }
+
+    // now, iff XFB was set, strip away one more layer to put the supposed 
client IP at the end
+    if (req.http.X-Forwarded-By) {
+        set req.http.XFF-ClientEnd = regsub(req.http.XFF-NoSSL, ",[^,]+$", "");
+    } else {
+        set req.http.XFF-ClientEnd = req.http.XFF-NoSSL;
+    }
+
+    unset req.http.XFF-NoSSL; // clean up temp var
+
+    // ****************************************************************
+    // *    v>  | XFF-CE         | map("carriers") input from regsub below
+    // *   all  | (...,)? client | client
+    // *****************************************************************
+
+    // Grab client IP from the end of the XFF-CE list and feed to map("zero")
+    set req.http.X-CS2 = netmapper.map("carriers", 
regsub(req.http.XFF-ClientEnd, "^([^,]+, ?)+", ""));
+
+    // normalize to boolean (varnish-3.0.4...)
+    if (req.http.X-CS2 == "") {
+        unset req.http.X-CS2;
+    }
+
+    unset req.http.XFF-ClientEnd; // clean up temp var
 
     // Now the per-carrier stuff...
     // Short of having netmapper (or equivalent) return a set of match 
parameters, I can't see any
@@ -35,7 +89,10 @@
 
     if (req.http.X-Forwarded-By) {
         // Current backend & cache vary on X-Forwarded-By, so in order not to 
fragment cache,
-        // unset X-Forwarded-By.
+        // unset X-Forwarded-By. We still need that value in the resulting 
X-Analytics header,
+        // so create a copy to be used in the vcl_deliver.
+        set req.http.X-Forwarded-By2 = req.http.X-Forwarded-By;
+
         // the backend now checks this value. Soon we will stop varying on it
         if (!req.http.X-CS) {
             unset req.http.X-Forwarded-By;
@@ -57,10 +114,62 @@
         header.append(resp.http.Set-Cookie,"ZeroOpts=tls");
     }
 
-    // Regardless of X-CS transform to "ON" for Vary and/or server-side
-    // purposes, if a carrier was detected at all, set X-CS as an outbound
-    // response header for the application to consume.
-    if (req.http.X-CS2) {
-       set resp.http.X-CS = req.http.X-CS2;
+    // Prepare for outbound X-CS and X-Analytics header assembly.
+    // See following section where actual outbound X-CS and X-Analytics are 
set.
+    // Some of the headers used for X-Analytics are not varied on, so add them 
after the backend processing.
+    // Note that vcl_deliver in other files may also modify X-Analytics
+    if (req.http.X-CS == "ON") {
+        set req.http.X-CS = req.http.X-CS2;
+    }
+
+    // Now that we've updated the req.http.X-CS for postprocessing, set 
outbound headers
+    // and X-Analytics.
+    if (req.http.X-CS) {
+
+        // All mobile responses need to have the X-CS set if it came from a 
zero network.
+        // This allows apps to detect when we switch from zero to non-zero and 
back,
+        // or when the user switches from one network to another and both are 
zero (multi-SIM phones).
+        // The client should constantly check each response for this header, 
and if it changes,
+        // re-request the partner's message info.
+        // This value is only based on Varnish's zero detection, not the 
backend processing of any sort.
+        // When apps see an outbound X-CS, they ask the API for the formal 
enabled/disabled status in a JSON object.
+        // We are in vcl_deliver, so setting the resp.http.X-CS should not 
impact caching; it's postprocessing.
+        set resp.http.X-CS = req.http.X-CS;
+
+        // Some carriers have multiple ip ranges with different rules. IP set 
name is added at
+        // the end of the X-CS, e.g. "123-45|wap" to indicate a wap gateway 
with their own IPs.
+        // To minimize analytics impact, the 'zero' value in X-Analytics will 
stay as carrier id,
+        // and the additional suffix will go into the 'zeronet=xxx' if it was 
present.
+
+        // The following is strictly related to X-Analytics, not outbound X-CS 
for apps.
+        // Remove everything before and including the first '|' char
+        set req.http.X-ZEROSFX = regsub(req.http.X-CS, "^[^\|]*\|?", "");
+        if (req.http.X-ZEROSFX != "") {
+            // Remove everything on and after the first '|' char
+            set req.http.X-CS = "zero=" + regsub(req.http.X-CS, "\|.*", "") + 
";zeronet=" + req.http.X-ZEROSFX;
+        } else if (req.http.X-CS == "404-01b") {
+            // For a short time we had this workaround. This "else if" can be 
removed after 2014-10-15
+            // We only had one carrier, "404-01b", whom we temporarily had to 
create an extra entry until backend was ready.
+            // In order for analytics to report correct stats, we used to 
remove "b" before modifying X-Analytics.
+            set req.http.X-CS = "zero=404-01;zeronet=b";
+        } else {
+            set req.http.X-CS = "zero=" + req.http.X-CS;
+        }
+
+        if (resp.http.X-Analytics) {
+            set resp.http.X-Analytics = resp.http.X-Analytics + ";" + 
req.http.X-CS;
+        } else {
+            set resp.http.X-Analytics = req.http.X-CS;
+        }
+    } else {
+        // Let's ensure no outbound X-CS if pre-analytics X-CS (as determined 
in corresponding if) was not set.
+        unset resp.http.X-CS;
+    }
+    if (req.http.X-Forwarded-By2) {
+        if (resp.http.X-Analytics) {
+            set resp.http.X-Analytics = resp.http.X-Analytics + ";proxy=" + 
req.http.X-Forwarded-By2;
+        } else {
+            set resp.http.X-Analytics = "proxy=" + req.http.X-Forwarded-By2;
+        }
     }
 }

-- 
To view, visit https://gerrit.wikimedia.org/r/243885
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6e4d784c28f38b227bc2661278b80e22d35dc98c
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: Faidon Liambotis <fai...@wikimedia.org>

_______________________________________________
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to