We've deployed the change to bucketing, but we are still seeing the same
issue in the collected data.

Again we are generating a unique 64 bit random number when the user gets to
the page. We are seeing this same 64 bit unique number being reported by
multiple ip addresses.

Since deploying the new schema number with the updated bucket selection we
have seen 13 distinct tokens coming from 42 distinct ip addresses. This
shouldn't be possible.

mysql:[email protected] [log]> select count(distinct
clientIp) from CompletionSugges
tions_13630018;

+--------------------------+
| count(distinct clientIp) |
+--------------------------+
|                       42 |
+--------------------------+
1 row in set (0.00 sec)

mysql:[email protected] [log]> select count(distinct
event_pageViewToken) from CompletionSuggestions_13630018;

+-------------------------------------+
| count(distinct event_pageViewToken) |
+-------------------------------------+
|                                  13 |
+-------------------------------------+
1 row in set (0.00 sec)



My best guess at this point is that something has changed in the way these
clientIp's are collected and is incorrect.


On Mon, Sep 14, 2015 at 1:32 PM, Erik Bernhardson <
[email protected]> wrote:

> Thanks for taking a look over this. I've incorperated your suggestions
> into a patch[1] and if all looks good will send that out in SWAT. We should
> be able to look at the data collected overnight and see if things are more
> sane tomorrow.
>
> [1] https://gerrit.wikimedia.org/r/#/c/238306/
>
> On Mon, Sep 14, 2015 at 11:56 AM, Gergo Tisza <[email protected]>
> wrote:
>
>> ​You are queueing a logging callback every time a request is sent (which
>> is roughly every time the user types another character in the search box)
>> until the tracking module finishes loading and mw.searchSuggest.request is
>> restored. On a slow connection the user might type several characters and
>> trigger several log events by then. If you filter for queries from the same
>> non-unique IP, you will probably see something like "a", "ab", "abc"...
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to