Excellent summary. Please make sure this is on wiki as well.

Thanks

Kevin
On Dec 10, 2015 8:05 AM, "Oliver Keyes" <[email protected]> wrote:

> Totally unrelated to my previous email, I promise. This is just me
> writing down my thinking on how A/B testing works, and how it applies
> to the portal (www.wikipedia.org) experiments and the schema we have
> deployed there.
>
> A/B testing is a common way of identifying if a proposed change to a
> piece of software is actually an improvement or not: it consists of
> taking a sample of users and dividing them into two groups, the "A"
> and "B" groups (hence the name). One group is consistently given the
> experimental change (the "test" group). One group is consistently
> given the default experience (the "control" group). Users are
> pseudorandomly sorted into each group, so that both groups are even.
> The end outcome for both groups is compared, and the change is
> successful if users in the test group are statistically significantly
> more likely to experience a better outcome than the users in the
> control group.
>
> When we put together the schema for the Portal we did it after months
> of experimenting with the Cirrus A/B tests, which means that we tried
> to structure it to take into account the lessons we learned there. We
> discovered that things were simpler the more fields you had; that
> maintaining a base population who were not participating in any tests
> was ideal for dashboarding. Accordingly the schema tracks every KPI we
> care about for the portal and contains a "cohort" field that indicates
> if someone is in the "A" group, the "B" group, or no group whatsoever
> - with the idea that most users at any one time would be in /no/ group
> and we could rely on that population for dashboarding! That way we can
> handle everything with one schema.
>
> So the things to remember when setting up Portal tests:
>
> 1. The test and control groups should be even;
> 2. The test and control group should (together) make up a very small
> chunk of the total people getting the logging. 10% combined, say.
> 3. The test and control group should both be represented with "cohort"
> values, with nothing (to produce a MySQL NULL) for the rest of the
> population.
>
> That way we can both test and dashboard simultaneously.
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> _______________________________________________
> discovery mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/discovery
>
_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to