After reviewing a weeks worth of data for the commons terms A/B test we
have decided that we have not collected enough information. The initial
sampling was:

1:1000 users chosen to participate in test
Those users split into 6 buckets, giving each bucket a 1:6000 sampling

This has collected ~100 events per bucket, much less in the "strict" bucket

We are increasing the main sampling by 5x, to 1:200. This will give each
bucket a 1:1200 sampling of users.  The reason these collect so little data
is that quite a few queries don't meet the minimum requirements to be
effected by the tests. The "aggressive recall" test requires at least 3
words in the query, and the "strict" test requires at least 6 words in the
query.

Erik B.
_______________________________________________
discovery mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/discovery

Reply via email to