Re: [WikimediaMobile] [reading-wmf] Browse Hypothesis Results

Brian Gerstle Tue, 13 Oct 2015 08:37:31 -0700

Great experiment!  A couple questions/comments:

   1. The % clickthrough per category shows SF Landmarks at 120%. Is that
   correct, and if so, what does it mean?
   2. As a big believer in the power of categories as a driver for
   engagement, I would love to see more variations of this experiment w/
   different placements, in a feed, different categories, add'n of portals, as
   a FTUE, etc. (likely to have a great deal of overlap w/ cascade D: deep
   dive educational experience)
   3. Also loved the win/needs-improvement breakdown at the end


Again, nice work!

On Tue, Oct 13, 2015 at 11:23 AM, Jon Katz <[email protected]> wrote:

> Thanks, Joaquin!
>
> On Tue, Oct 13, 2015 at 4:32 AM, Joaquin Oltra Hernandez <
> [email protected]> wrote:
>
>> Thanks a lot for the detailed report Jon.
>>
>> I've parsed it and posted it to
>> https://www.mediawiki.org/wiki/Reading/Web/Projects/Categories_Browse so
>> that can keep it more accessible than the mailing list archive
>> <https://lists.wikimedia.org/pipermail/mobile-l/2015-October/009827.html>
>> .
>>
>> Any help with formatting or text corrections would be appreciated.
>>
>>
>> On Sun, Oct 11, 2015 at 8:32 PM, Jon Katz <[email protected]> wrote:
>>
>>> Hi Team,
>>> I just wanted to update you on the results of something we internally
>>> referred to as the '*browse' *prototype.
>>> TLDR: as implemented the mobile 'browse by category' test did not drive
>>> significant engagement.  In fact, as implemented, it seemed inferior to
>>> blue links.  However, we started with a very rough and low-impact
>>> prototype, so a few tweaks would give us more definitive results.
>>>
>>> Here is the doc from which I am pasting from below:
>>> https://docs.google.com/document/d/1Mqw-awAcp01IcLhHPsHmWsqaAyK1l2-w_LMDtizyFQ4/edit?usp=sharing
>>>
>>> Questions/comments welcome!
>>> Best,
>>>
>>> J
>>>
>>>
>>> Browse Prototype Results
>>>
>>>
>>> 
>>>
>>> Intro
>>> <https://docs.google.com/document/d/1Mqw-awAcp01IcLhHPsHmWsqaAyK1l2-w_LMDtizyFQ4/edit#heading=h.6s40inyan02p>
>>>
>>> Process
>>> <https://docs.google.com/document/d/1Mqw-awAcp01IcLhHPsHmWsqaAyK1l2-w_LMDtizyFQ4/edit#heading=h.d5x661n72t7d>
>>>
>>> Results
>>> <https://docs.google.com/document/d/1Mqw-awAcp01IcLhHPsHmWsqaAyK1l2-w_LMDtizyFQ4/edit#heading=h.naqxa4etwhl4>
>>>
>>> Blue links in general
>>> <https://docs.google.com/document/d/1Mqw-awAcp01IcLhHPsHmWsqaAyK1l2-w_LMDtizyFQ4/edit#heading=h.8nn07h675j3o>
>>>
>>> Category tags
>>> <https://docs.google.com/document/d/1Mqw-awAcp01IcLhHPsHmWsqaAyK1l2-w_LMDtizyFQ4/edit#heading=h.gagragojxpiz>
>>>
>>> Conclusion and Next Steps
>>> <https://docs.google.com/document/d/1Mqw-awAcp01IcLhHPsHmWsqaAyK1l2-w_LMDtizyFQ4/edit#heading=h.z3p82tg8enr>
>>>
>>> Process
>>> <https://docs.google.com/document/d/1Mqw-awAcp01IcLhHPsHmWsqaAyK1l2-w_LMDtizyFQ4/edit#heading=h.ocqtfqhf8n0t>
>>>
>>> Do people want to browse by categories?
>>> <https://docs.google.com/document/d/1Mqw-awAcp01IcLhHPsHmWsqaAyK1l2-w_LMDtizyFQ4/edit#heading=h.9ksw2zvt8q19>
>>> 
>>>
>>>
>>> Intro
>>>
>>> As outlined in this doc
>>> <https://docs.google.com/presentation/d/1ZssE8G0P5WVg8XmkBTi5G3n4OdLHPFGWZDZFW5_DSS0/edit?usp=sharing>,
>>> the concept is a tag that allows readers to navigate WP via categories that
>>> are meaningful and populated in order of 'significance' (as determined by
>>> user input).  The hypothesis:
>>>
>>>    -
>>>
>>>    users will want to navigate by category if there are fewer, more
>>>    meaningful categories per page and those category pages showed the most
>>>    ‘notable’ members first.
>>>
>>> Again, see the full doc
>>> <https://docs.google.com/presentation/d/1ZssE8G0P5WVg8XmkBTi5G3n4OdLHPFGWZDZFW5_DSS0/edit?usp=sharing>
>>> to understand the premise.
>>>
>>> Process
>>>
>>> The first step was to validate: do users want to navigate via category?
>>> So we built a very lightweight prototype on mobile web, en wikipedia
>>> (stable, not beta) using hardcoded config variables, in the following
>>> categories ( ~4000 pages).  Here we did not look into sub-categories
>>> with one exception (see T94732
>>> <https://phabricator.wikimedia.org/T94732> for details).  There was
>>> also an error and 2 of the categories did not have tags implemented (struck
>>> through, below)
>>>
>>> Category
>>>
>>> Pagecount
>>>
>>> NBA All Stars
>>>
>>> 400
>>>
>>> American Politicians
>>>
>>> 818
>>>
>>> Object-Oriented Programming Languages
>>>
>>> 164
>>>
>>> European States
>>>
>>> 24
>>>
>>> American Female Pop Singers
>>>
>>> 326
>>>
>>> American drama television series
>>>
>>> 1048
>>>
>>> Modern Painters
>>>
>>> 983
>>>
>>> Landmarks in San Francisco, California
>>>
>>> 270
>>>
>>>
>>>
>>> Here is how it appeared on the Alcatraz page
>>>
>>>
>>> When the user clicked the tag, they were taken to a gather-like
>>> collection based on manually estimated relevance
>>>
>>> (sorry cropped shot)
>>>
>>>
>>>
>>>
>>> The category pages were designed to show the most relevant (as deemed by
>>> me) to the broadest audience, first. Here is the ordering:
>>> https://docs.google.com/spreadsheets/d/12xLXQsH1zcg6E8lDuSonumZNdBvfaBuHOS1a1TCASK4/edit#gid=0
>>>
>>> This was intended to lie in contrast with our current category pages,
>>> which are alphabetical and not really intended for human browsing:
>>> https://en.wikipedia.org/wiki/Category:American_male_film_actors
>>>
>>>
>>> We primarily measured a few things:
>>>
>>>
>>>    -
>>>
>>>    when a tag was seen by a user
>>>    -
>>>
>>>    when a tag was clicked on by a user
>>>    -
>>>
>>>    when a page in the new ‘category view’ was clicked on by a user
>>>
>>>
>>> As a side effort, I looked to see if overall referrals from pages with
>>> tags went up--this was a timed intervention rather than an a/b test and
>>> given the click-thru on the tags, the impact would have been negligible
>>> anyway.  This was confirmed by some very noisy results.
>>>
>>>
>>> Results
>>> Blue links in general
>>>
>>> One benefit of the side study mentioned in the previous paragraph is
>>> that I was able to generate a table that looked at the pages in question
>>> before we started the test that shows a ratio of total pageviews/pageviews
>>> referred by a page (estimate of how many links were opened from that
>>> page).  Though it is literally just for 0-1 GMT, 6/29/15, now  that we have
>>> the pageview hourly table, a more robust analysis can tell us how
>>> categories differ in this regard:
>>>
>>>
>>> Category
>>>
>>> links clicked
>>>
>>> #pvs
>>>
>>> clicks/pvs
>>>
>>> Category:20th-centuryAmericanpoliticians
>>>
>>> 761
>>>
>>> 1243
>>>
>>> 61%
>>>
>>> Category:Americandramatelevisionseries
>>>
>>> 5981
>>>
>>> 8844
>>>
>>> 68%
>>>
>>> Category:Americanfemalepopsingers
>>>
>>> 2502
>>>
>>> 4280
>>>
>>> 58%
>>>
>>> Category:LandmarksinSanFrancisco,
>>>
>>> 104
>>>
>>> 287
>>>
>>> 36%
>>>
>>> Category:Modernpainters
>>>
>>> 136
>>>
>>> 369
>>>
>>> 37%
>>>
>>> Category:NationalBasketballAssociationAll-Stars
>>>
>>> 1908
>>>
>>> 3341
>>>
>>> 57%
>>>
>>> Category:Object-orientedprogramminglanguages
>>>
>>> 48
>>>
>>> 181
>>>
>>> 27%
>>>
>>> Category:WesternEurope
>>>
>>> 657
>>>
>>> 1221
>>>
>>> 54%
>>>
>>> Grand Total
>>>
>>> 12099
>>>
>>> 19766
>>>
>>> 50%
>>>
>>>
>>> You can see here that for pages in the category  ‘Landmarks in San
>>> Francisco’, if there are 10 pageviews, 5.4 clicks to other pages are
>>> generated on average.
>>>
>>> I do not have the original queries for this handy, but can dig them up
>>> if you’re really interested.
>>>
>>> Category tags
>>>
>>> Full data and queries here:
>>> https://docs.google.com/a/wikimedia.org/spreadsheets/d/1vD3DopxGyeh9FQsuTQDMo6f5y43Yoy5gnJQqKn9hEQg/edit?usp=sharing
>>>
>>> The tags themselves generated an average click-through rate of .18%.
>>> Given the overall click thru rate on the pages estimated above ~50%, this
>>> single tag is not driving anything significant.  Furthermore, given Leila
>>> and Bob’s paper suggest that this is performing no better than a
>>> mid-article click--given the mobile web sections are collapsed, I would
>>> need to understand more about their method to know just how to interpret
>>> their results against our mobile-web only implementation.  Furthermore, our
>>> click through rate used the number of times the tag appeared on screen as
>>> the denominator, whereas their research looked at overall pageviews.
>>>
>>>
>>> This being noted, the tag was implemented to be as obscure as possible
>>> to establish a baseline.  Furthermore, any feature like this would probably
>>> be different in the following ways:
>>>
>>>    -
>>>
>>>    each page would be in 1-4 tag groups (as opposed to just 1)
>>>    -
>>>
>>>    each page would be tagged, creating the expectation on the part of
>>>    the user that this was something to look for
>>>    -
>>>
>>>    presumably the categories could be implemented as a menu item as
>>>    opposed to being buried at the bottom of the page (and competing with
>>>    features like read more.
>>>    -
>>>
>>>    Using the learnings from ‘read more’ tags with images or buttons
>>>    would likely fare much better.
>>>
>>>
>>> The follow graph shows:
>>>
>>>    -
>>>
>>>    number of impressions on the right axis
>>>    -
>>>
>>>    click-thru-rate on the left-axis.
>>>
>>>
>>>
>>> When you look at click through rates on the ‘category’ pages themselves,
>>> you see that they average at 41% (Chart below)  Meaning that for every 10
>>> times a user visited a category page, there were 4.1 clicks to one of those
>>> pages as a result.
>>>
>>>
>>> Here is the same broken up by category:
>>>
>>>
>>> Each ‘category’ page here had at least 400 visits, and you can see that
>>> the interest seems to vary dramatically across categories.  It is worth
>>> noting that the top three categories here are the ones with the fewest
>>> entities.  Each list, however, was capped at ~50 articles, so it is unclear
>>> what might be causing this effect, if it is real.
>>>
>>> As mentioned above, the average article page has an overall click rate
>>> of 50%. So this page of categories did not have the click-through rate that
>>> a page has.  However, this page had summaries of each of the pages, so it
>>> could be that users were generating value beyond what a blue link would
>>> provide.  A live-user test of Gather collections, from whom this format was
>>> borrowed, suggested that the format used up too much vertical space on each
>>> article and was hard to flip through.  Shortening the amount of text or
>>> image space might be something to try to make the page more useful
>>>
>>>
>>> Conclusion and Next StepsProcess
>>>
>>>    -
>>>
>>>    This was the first time I am aware of that we ran a live prototype
>>>    and learn something without building a scalable solution. Win
>>>    -
>>>
>>>    Developer time was estimated at 1 FTE for 2 weeks (by pheudx), but
>>>    the chronological time for pushing to stable took a quarter. Room
>>>    for improvement
>>>    -
>>>
>>>    The time to analysis was almost 2 quarters, due to a lack of data
>>>    analysis support (I ran the initial analysis within 2 weeks of launch,
>>>    during paternity leave, but was unable to go back and get it ready to
>>>    distribute for 3 months).  Room for improvement--possibly solved by
>>>    additional Data Analyst.
>>>
>>>
>>> This experiment was not designed to answer questions definitively in one
>>> round, but with the understanding that multiple iterations would allow us
>>> to fully answer our questions.
>>>
>>> The long turn-around time, particularly around analysis and
>>> communication, meant that tweaking a variable to test the conclusions or
>>> the new questions that arosee below will involve a whole lot more work and
>>> effort than if we had been able to explore modifications within a few weeks
>>> of the initial launch.
>>>
>>>
>>> Do people want to browse by categories?
>>>
>>> Category tags at the bottom of the mobile web page in a dull gray
>>> background that lead to manually curated categories are not a killer
>>> feature :)
>>>
>>> I would be reluctant to say that this means users are not interested in
>>> browsing by category, however.  For instance, it is likely that
>>>
>>>    -
>>>
>>>    users did not notice the tag, even if it appeared on screen
>>>    -
>>>
>>>    users are accustomed to our current category tags on desktop and not
>>>    interested in that experience
>>>    -
>>>
>>>    users who did like the tag were unlikely to find another page that
>>>    had it--there was no feedback mechanism by which the improved category 
>>> page
>>>    would drive additional tag interactions
>>>    -
>>>
>>>    the browse experience created was not ideal
>>>
>>>
>>>
>>>
>>> If we decide to pursue what is currently termed “cascade c: update ux”,
>>> I would like to proceed with more tests in this arena, by altering the
>>> appearance and position of the tags, and by improving the flow of the
>>> ‘category’ pages.  If we choose a different strategy, hopefully other teams
>>> can build off of what was learned here.
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> reading-wmf mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/reading-wmf
>>>
>>>
>>
>> _______________________________________________
>> Mobile-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>>
>>
>
> _______________________________________________
> Mobile-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/mobile-l
>
>


-- 
EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle
IRC: bgerstle

_______________________________________________
Mobile-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mobile-l

Re: [WikimediaMobile] [reading-wmf] Browse Hypothesis Results

Reply via email to