When user is looking for e certain information on Wikipedia, it sometimes find this information by using redirects. Not very often but number of hits on redirects can be hundreds or even thousands times more than number of hit on corresponding articles. So if you consider subjects of what users were looking for it does not make sense to separate hits on redirects from hits on corresponding articles. When calculating data for wikipediatrends.com we combined URLs of articles and their redirects exectly like Oliver described. It was not very hard because we did it before summarising and cleaning raw data.
On Sun, Apr 27, 2014 at 9:21 PM, Oliver Keyes <[email protected]> wrote: > The problem with that is that, as Henrik said, it works on the basis of > URLs, not page names. The only way to discover "X is a redirect to Y" would > be to prod the database for that information, for every unique URL. > > > On 27 April 2014 05:36, Jane Darnell <[email protected]> wrote: > >> Henrik, >> >> I don't know about the page history part of this question, but going >> forward it would be nice to offer an extra option to include hits on >> all incoming redirects though. There are two issues with this. The >> first is when the name needs disambiguation and gets split out into >> towns, provinces or countries (for places) or gets split out into >> occupations (for people). The second is when wikipedians go through >> with bots and "correct" links. I assume they do this so that the page >> hits will become more representative, but I often work on old names >> and I try to preserve original spelling (especially by older authors) >> to increase the "findability". To do this I create a lot of redirects >> based on older spellings to use in pages where the older spelling is >> used in a reference. I've noticed "redirect" corrections in cases >> where there is no disambiguation needed, and so I think offering an >> option for all redirects might help stop that behavior. >> >> Jane >> >> 2014-04-27 12:34 GMT+02:00, Henrik Abelsson <[email protected]>: >> > >> > On 2014-04-27 08:45, ENWP Pine wrote: >> > >> >> * I think it would be desirable to add an https option to >> >> stats.grok.se so that viewers' interests in page readership statistics >> >> are more private. >> >> >> > Hm, why not? I'll request a certificate and start serving https also. I >> > hadn't really thought of page readership statistics as something all >> > that sensitive, but I don't see any downside to also serving https. >> >> * There is an issue in the statistics given at [1]. As you can see >> >> from [2] editors created and edited the project page on days when >> >> stats.grok.se said there were no pageviews. This may be the result of >> >> a page move [3] [4] and the pre-move views were not integrated into >> >> the results shown in [1]. Is this the expected and desired behavior? >> >> From my point of view as a Signpost author, this is undesirable as we >> >> try to track our readership statistics. I think it is the case that if >> >> page A is moved to new page B then the statistics for page A should be >> >> integrated into those for page B and a notice should be given to the >> >> viewer that the statistics for page B includes those from page A which >> >> was moved on date X. This problem may affect other pages that are the >> >> subject of mergers. I think it would be the case that if page A is >> >> merged into page B then we would want some notice to appear on >> >> stats.grok.se alerting the viewer that there was a merger, the date of >> >> the merger, and offering the viewer a way to select statistics with >> >> and without the historical information from page A as they look at the >> >> viewership statistics for page B. >> >> >> > That would indeed be better. However, the statistics data tracks URLs >> > rather than pages and it's computationally expensive to look up the page >> > history and what URLs it has been accessible through. The average >> > throughput of view statistics is that some tens of thousands of entries >> > are added per minute 24/7. One could perhaps do it as the data was >> > requested, but that would mean making several round-trips to the WMF >> > servers to look up the history of all moves and correlate URLs across >> > time. It's certainly possible to build a tool that does that on top of >> > the stats.grok.se and wikipedia APIs though. >> > >> > -henrik >> > >> > _______________________________________________ >> > Analytics mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/analytics >> > >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- Thank you. Alex Druk [email protected] (775) 237-8550 Google voice
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
