I re-ran the sessions job including IP in the output. Several things: - I'm happy to report that we are correctly filtering out the WMF public IPs, though there are about 100k hits per day from 10.x.x.x IPs (about 0.5%, LVS health checks) that we missed. We'll update the filter to include those.
- So, who is it? I ran the IPs of the top sessions through whois and tried to extract the org name. The results (omitting IP for privacy reasons) are here: https://docs.google.com/a/wikimedia.org/spreadsheet/ccc?key=0Ai_u2wTiMldddHNrZVNVemF4MndaMTJLNnB6eGlQOHc#gid=0 A pretty interesting list. -- David Schoonover [email protected] On Thu, Apr 25, 2013 at 10:38 AM, Haitham Shammaa <[email protected]>wrote: > Maryana, that Wikipedia article is about a TV series which is being > broadcasted since 2006, but I don't think it's very popular. > > On the other hand, nobody seems to mention the crab Big Daddy in the > Japanese internet culture. > > *--* > *Haitham Shammaa* > *Contribution Research Manager* > *Wikimedia Foundation* > > *Imagine a world in which every single human being can freely share in > the sum of all knowledge. * > *Click the "edit" button now, and help us make it a reality!* > > > On Thu, Apr 25, 2013 at 10:19 AM, Maryana Pinchuk > <[email protected]>wrote: > >> On Wed, Apr 24, 2013 at 9:17 PM, Dario Taraborelli < >> [email protected]> wrote: >> >>> Dave, >>> >>> thanks for sharing this, the referral data is particularly fascinating. >>> I mentioned during the quarterly review that I'd love to get a better sense >>> of (1) the proportion of requests in the mobile request logs lacking a >>> referral, (2) the possible causes of this gap and (3) to what extent these >>> missing entries introduce a bias in the referral ranking. >>> >>> The 3rd most popular query (according to your dumps) is ビッグダディ (japanese >>> for "Big Daddy"), which presumably refers to this guy: >>> http://metro.co.uk/2013/03/20/giant-japanese-spider-crab-big-daddy-arrives-at-blackpool-sea-life-centre-3550751/ >>> What's interesting is that there's no such entry on the japanese >>> Wikipedia and I am baffled that people may have landed on the website via a >>> search engine query for a non-existing article. >>> Do you have an explanation for this or am I misinterpreting what you >>> mean by search query? >>> >> >> There *is* an article on this on ja.wiki :) It may have been renamed >> since then, but it's still the 2nd Google hit for ビッグダディ: >> http://ja.wikipedia.org/wiki/%E7%97%9B%E5%BF%AB!%E3%83%93%E3%83%83%E3%82%B0%E3%83%80%E3%83%87%E3%82%A3 >> >> >>> >>> Dario >>> >>> On Apr 24, 2013, at 8:40 PM, David Schoonover <[email protected]> wrote: >>> >>> Hiya all, >>> >>> As promised earlier today in the Analytics weekly showcase, I've got a >>> few interesting bits of data to share from playing with the new Mobile Site >>> Sessions dataset. >>> >>> >>> # Visits to Mobile Site, 4/21/2013 >>> >>> - Total Visits: 51,624,103 >>> - Unique Visitors: 37,736,120 >>> - Total Pageviews: 104,972,033 >>> - Avg Pageviews per Session: 2.0334 >>> - Max Pageviews in one Session: 141,882 >>> >>> ## Standard Site >>> - Visits: 51,603,221 >>> - Unique Visitors: 37,723,188 >>> - Pageviews: 104,910,382 >>> - Avg Pageviews per Session: 2.033 >>> >>> ## Alpha Site >>> - Visits: 986 >>> - Unique Visitors: 822 >>> - Pageviews: 7,087 >>> - Avg Pageviews per Session: 7.188 >>> >>> ## Beta Site >>> - Visits: 19,896 >>> - Unique Visitors: 16,235 >>> - Pageviews: 54,564 >>> - Avg Pageviews per Session: 2.742 >>> >>> >>> ## Notes >>> - A session (or "visit") is defined as all activity with less than 30 >>> minutes between each hit. Intuitively speaking, a session ends when the >>> user hasn't done anything in 30m. >>> - As we do not set visitor_id cookies for all users, the "unique >>> visitors" metric was calculated using hash(ip_address + users_agent) as >>> visitor_id. >>> - This job looked at all requests to the mobile site on 4/21/2013, which >>> is 75.17 GB of request logs. >>> - The job took ~17 minutes to process the day into 15.3 GB of sessions. >>> - The summary above took maybe 10 minutes to set up/write in Hive, and >>> the job took maybe 7 minutes. >>> >>> >>> In addition to that summary, I ran a few jobs on the entry_referer field >>> -- the URL that referred the user to us when the session started. Obvious >>> caveats: this is only one day of data, and it's only the mobile site. Draw >>> conclusions with care. >>> >>> First, I pulled out the top referring domains. It's mostly as you'd >>> expect -- search engines -- though you'll also note that several Wikipedia >>> mobile sites show up. My working hypothesis is that people don't tend to >>> close tabs on smartphones; when they later come back, it is often to an >>> open Wikipedia tab: clicking a link or perform a search means the referrer >>> is still us. >>> >>> Since -- as expected -- so much of the data pertained to search engines, >>> I also calculated the top search queries and top keywords that sent people >>> to us. (For keywords, I've filtered out common "stop words": de, of, in, >>> is, la, and, el, es, to, en, di, los, le, da, se, las, les, il, du, a, i, >>> o, y, e.) In both, you see the predictable: lots of searches for porn, for >>> "facebook", for "wiki", etc. But you also see a few things that surprised >>> me: >>> >>> - Tons of Japanese. Japan is the most mobile-enabled country in the >>> world so I guess we should have expected to see many searches in Japanese >>> show up in the top queries. I've left them URL-encoded in the results -- >>> you'll see them as weird lines with % in them. >>> >>> - Apparently people search for movies and TV so they can spoil their fun >>> by reading about them on Wikipedia. Both of "movies" and "film" show up in >>> the top keywords; Iron Man 1, 2, AND 3 all show up in the top search >>> queries. I didn't expect this was a major use-case, but -- wikigroaning >>> aside -- it's an interesting fact. >>> >>> I'm sure we're only scratching the surface here. This is an exciting >>> dataset, and I'm sure there's lots more to learn! >>> >>> The full results: >>> - Top Referring Entry Domains: >>> http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mobile_sessions-2013-04-21-top_entry_domains.tsv >>> - Top Referring Entry Search Queries: >>> http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mobile_sessions-2013-04-21-top_entry_search_queries.tsv >>> - Top Referring Entry Search Keywords: >>> http://stats.wikimedia.org/kraken-public/webrequest/mobile/views/sessions/mobile_sessions-2013-04-21-top_entry_keywords.tsv >>> >>> Questions are welcome! >>> >>> >>> -- >>> David Schoonover >>> [email protected] >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >>> >> >> >> -- >> Maryana Pinchuk >> Associate Product Manager, Wikimedia Foundation >> wikimediafoundation.org >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Mobile-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mobile-l
