I believe we did a one-question gender microsurvey before (linked to one of the new-user features?). I don't know whether the data was useful or not, but I do remember the act of asking the question itself got some pushback as being invasive/unwelcoming/weirdly communicated/etc. (and I can certainly symapthise with this)
So as well as the value of the data, we should consider whether the act/method of asking is going to have knock-on effects on what we're trying to measure. Andrew. On 28 August 2014 20:55, Jonathan Morgan <[email protected]> wrote: > Stepping back... > > We all seem to agree that user-set gender preference is a problematic > measure. We don't trust it. We can come up with plausible hypotheses for why > someone would mis-report their gender. And we can be almost certain it's not > a representative sample. > > Do we have any ideas for what a better measure would be? Seems to me that > we're dealing with self-report data no matter what. But perhaps a more > explicit elicitation would be better? Folks have suggested a one-question > gender microsurvey before. Of course that will come with its own sources of > bias, and I don't quite see how we can control for them. > > Given that it would be useful to have some data on gendered editing patterns > (whether we share it publicly or not), what are our options? > > - Jonathan > > > On Thu, Aug 28, 2014 at 10:03 AM, Ryan Kaldari <[email protected]> > wrote: >> >> And because I know someone is going to point this out... Actually, >> restricting the data to only editors who have explicitly set their gender >> would not completely control for changes in the rate of setting the >> preference since that rate could change differently for men and women. It >> would at least help to control for overall changes in the rate, for example, >> due to the change in the interface that Steven mentioned. >> >> Kaldari >> >> On Aug 28, 2014, at 9:50 AM, Ryan Kaldari <[email protected]> wrote: >> >> We could restrict the query to only look at editors who had explicitly set >> their gender preference. That would control for changes in the rate of >> setting the preference. The data would then only be biased by users who had >> explicitly set their gender to the incorrect gender, which I imagine would >> be a very small percentage. >> >> Also, I would like to point out that even our most fundamental metrics are >> affected by similar biases and inconsistencies. For example, the rate of new >> editors is polluted by long-time IP editors who suddenly decide to create an >> account. If there is an increase in IP editors converting to registered >> editors, it can mislead us into thinking that we are suddenly attracting a >> lot of new editors. This is just one of many examples I'm sure you're >> already familiar with. >> >> To answer your question though, I think if we notice something interesting >> in the data (especially a downward trend), we would start a discussion about >> it (as we would with any interesting data) and hopefully inspire someone to >> dig deeper. Right now though we are mostly in the dark. See, for example, >> Phoebe's most recent email to the gendergap list lamenting the lack of >> research and data. >> >> Kaldari >> >> >> On Thu, Aug 28, 2014 at 1:43 AM, Aaron Halfaker <[email protected]> >> wrote: >>> >>> I think the biggest problem is this: >>> >>> Let's say that we see the proportion of users who set their gender >>> preference to female falling. Is that because women are becoming less >>> likely to set their gender preference or because the ratio is actually >>> becoming more extreme? >>> >>> Let's say that we see a trend in the messy data. What do we do about >>> that? Do we assume that it is a change in the actual ratio? Do we assume >>> that it is a change in the propensity of females to set their gender >>> preference and there's nothing for us to do? Or do we then decide that it >>> is important for us to gather good data so that we can actually know what's >>> going on? >>> >>> -Aaron >>> >>> >>> On Thu, Aug 28, 2014 at 4:50 AM, Ryan Kaldari <[email protected]> >>> wrote: >>>> >>>> On Tue, Aug 26, 2014 at 9:53 AM, Leila Zia <[email protected]> wrote: >>>>> >>>>> 1. We look at the self-reported gender data and do some simple >>>>> observations. >>>>> Pros: >>>>> + we will have an updated view of the gender gap problem. >>>>> + we may spread seeds for further internal and/or external research >>>>> about it. >>>>> Cons: >>>>> - If simple observations are not communicated properly, they will >>>>> result in misinformation, that can possibly do more harm than good. >>>>> - The results will be very limited given that we know the data is >>>>> very limited and contains biases. >>>> >>>> >>>> I would definitely like to avoid spreading misinformation, which is why >>>> I proposed only looking at the percentage change per month rather than raw >>>> numbers or raw percentages. The raw numbers are almost certainly off-base >>>> and would be much more likely to be latched onto by the public and the >>>> media. Percentage change per month is a less 'sexy' statistic, but might >>>> give us better clues about what's actually going on with the gender gap >>>> over >>>> time. It would also, for the first time, give us some window into how new >>>> features or issues may be actively affecting the gender gap. But again, it >>>> would only be a canary in a coal mine, not a tool to draw reliable >>>> conclusions from. For that, we need more extensive tools and analysis. >>>> >>>>> 2. We do extensive gender gap analysis internally. >>>>> Proper gender gap analysis, in a way that can result in meaningful >>>>> interventions (think products and features by us or the community) >>>>> requires >>>>> one person from R&D to work on it almost full time for a long period of >>>>> time >>>>> (at least six months, more probably a year). In this case, the question >>>>> becomes: How should we prioritize this question? Just to give you some >>>>> context: Which of the following areas should this one person from R&D work >>>>> on? >>>>> * reducing gender gap >>>>> * increasing editor diversity in terms of nationality/language/... >>>>> * increasing the number of active editors independent of gender >>>>> * identifying areas Wikipedia is covered the least and finding >>>>> editors who can contribute to those areas >>>>> * ... >>>> >>>> >>>> I think it's very difficult to judge how to set those priorities without >>>> having more data. We know that the active editors number is on a downward >>>> trajectory. Is the nationality/language diversity increasing or decreasing? >>>> Is the gender gap increasing or decreasing? In cases where things are >>>> actively getting worse, we should set our priorities to address them >>>> sooner, >>>> but without knowing those trajectories it's impossible to say. >>>> >>>> Kaldari >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Jonathan T. Morgan > Learning Strategist > Wikimedia Foundation > User:Jmorgan (WMF) > [email protected] > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- - Andrew Gray [email protected] _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
