Excellent summary. Please make sure this is on wiki as well. Thanks
Kevin On Dec 10, 2015 8:05 AM, "Oliver Keyes" <[email protected]> wrote: > Totally unrelated to my previous email, I promise. This is just me > writing down my thinking on how A/B testing works, and how it applies > to the portal (www.wikipedia.org) experiments and the schema we have > deployed there. > > A/B testing is a common way of identifying if a proposed change to a > piece of software is actually an improvement or not: it consists of > taking a sample of users and dividing them into two groups, the "A" > and "B" groups (hence the name). One group is consistently given the > experimental change (the "test" group). One group is consistently > given the default experience (the "control" group). Users are > pseudorandomly sorted into each group, so that both groups are even. > The end outcome for both groups is compared, and the change is > successful if users in the test group are statistically significantly > more likely to experience a better outcome than the users in the > control group. > > When we put together the schema for the Portal we did it after months > of experimenting with the Cirrus A/B tests, which means that we tried > to structure it to take into account the lessons we learned there. We > discovered that things were simpler the more fields you had; that > maintaining a base population who were not participating in any tests > was ideal for dashboarding. Accordingly the schema tracks every KPI we > care about for the portal and contains a "cohort" field that indicates > if someone is in the "A" group, the "B" group, or no group whatsoever > - with the idea that most users at any one time would be in /no/ group > and we could rely on that population for dashboarding! That way we can > handle everything with one schema. > > So the things to remember when setting up Portal tests: > > 1. The test and control groups should be even; > 2. The test and control group should (together) make up a very small > chunk of the total people getting the logging. 10% combined, say. > 3. The test and control group should both be represented with "cohort" > values, with nothing (to produce a MySQL NULL) for the rest of the > population. > > That way we can both test and dashboard simultaneously. > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > _______________________________________________ > discovery mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/discovery >
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
