>Also, just out of curiosity and to better understand the issue, what >would be an example of a real life request URL that results in such a >"no page title found" error when extracting the title? Special page requests, for example.
Normally pages like "Special:Blah" are "actions" not pages themselves. We do not count those as pageviews with the notably exception of Search requests (as they do provide content). So a page like "Special:Search: Blah-Blah" would be an example of a pageview with title "-" on pageview_hourly table. On Mon, Dec 5, 2016 at 3:15 PM, Tilman Bayer <tba...@wikimedia.org> wrote: > On Mon, Nov 14, 2016 at 12:25 PM, Nuria Ruiz <nu...@wikimedia.org> wrote: > > This is documented now here: > > > > https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Gotchas > Thanks for the documentation. Does this only affect data provided by > the API, or also the page_title > field in the pageview_hourly table, i.e. the source of the API data? > > In the latter case, please also add a note to the "known problems" at > https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly . > (This is the canonical place for documenting such issues - thanks for > making this explicit at > https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Issues_with_data > . > Separately, for pageview definition changes there is also > https://meta.wikimedia.org/wiki/Research:Page_view#Change_log . No > objections of course if the Analytics team commits to keeping the > information up to date in all three places.) > > Also, just out of curiosity and to better understand the issue, what > would be an example of a real life request URL that results in such a > "no page title found" error when extracting the title? > > > > On Tue, Nov 8, 2016 at 7:25 AM, Vipul Naik <vipulna...@gmail.com> wrote: > >> > >> Hi Joseph, > >> > >> Thanks for the clarification. > >> > >> Any ideas why this number is much higher for some months? In particular, > >> on desktop, it's high in the months of July to September 2015 (around 10 > >> million, compared to the usual 5 million) and then high again in October > >> 2016 (45 million, about 10x the usual value). > For context , https://en.wikipedia.org/wiki/- was the 8th most viewed > page on all projects from May to October 2015, see footnote [1] at > https://phabricator.wikimedia.org/T117945 (that bug, flagged as "High" > Analytics priority since almost a year, is about a separate but > similar issue) > > >> > >> Data is from > >> http://wikipediaviews.org/displayviewsformultiplemonths. > php?page=-&allmonths=allmonths&drilldown=all > >> which summarizes results from the Wikimedia API (and stats.grok.se for > data > >> before July 2015). > >> > >> Vipul > >> > >> On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou > >> <jalleman...@wikimedia.org> wrote: > >>> > >>> Hello Issa, > >>> > >>> Thank you for your question. > >>> The very high number of views of the "-" page is explained by this dash > >>> value being used as a special value for "no page title found" when > >>> extracting titles from urls. > >>> We definitely should document this in the API, creating this task: > >>> https://phabricator.wikimedia.org/T150249 > >>> Best > >>> Joseph > >>> > >>> > >>> On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice <ricei...@gmail.com> wrote: > >>>> > >>>> Dear Analytics Mailing List, > >>>> > >>>> Recently while querying pageviews of various pages, I discovered that > >>>> the page whose title is a single hyphen character (i.e. with the title > >>>> "-", with URL <https://en.wikipedia.org/wiki/->, which redirects to > >>>> <https://en.wikipedia.org/wiki/Hyphen-minus>) receives an unusually > high > >>>> number of pageviews under the Pageview API. Taking October 2015 as an > >>>> example, the page received 5.4 million pageviews during that month > >>>> according to the API: > >>>> > >>>> <https://wikimedia.org/api/rest_v1/metrics/pageviews/per- > article/en.wikipedia/desktop/user/-/daily/20151001/20151031>. > >>>> > >>>> However, according the stats.grok.se (which was still operational in > the > >>>> same month), the page received only 1209 pageviews: > >>>> <http://stats.grok.se/en/201510/->. > >>>> > >>>> Looking at the tabulation of pageviews on Wikipedia Views, the > increase > >>>> in pageviews for this page coincides with the change to the Pageview > >>>> API in July 2015: > >>>> > >>>> <http://wikipediaviews.org/displayviewsformultiplemonths. > php?page=-&allmonths=allmonths&drilldown=all>. > >>>> > >>>> As I understand, page titles must be URL-encoded before the query, > >>>> but the URL-encoding of "-" is itself. > >>>> > >>>> I looked at the API documentation but did not see this behavior > listed, > >>>> so I am wondering where these numbers are coming from. > >>>> > >>>> Best regards, > >>>> Issa > >>>> > >>>> > >>>> _______________________________________________ > >>>> Analytics mailing list > >>>> Analytics@lists.wikimedia.org > >>>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>>> > >>> > >>> > >>> > >>> -- > >>> Joseph Allemandou > >>> Data Engineer @ Wikimedia Foundation > >>> IRC: joal > >>> > >>> _______________________________________________ > >>> Analytics mailing list > >>> Analytics@lists.wikimedia.org > >>> https://lists.wikimedia.org/mailman/listinfo/analytics > >>> > >> > >> > >> _______________________________________________ > >> Analytics mailing list > >> Analytics@lists.wikimedia.org > >> https://lists.wikimedia.org/mailman/listinfo/analytics > >> > > > > > > _______________________________________________ > > Analytics mailing list > > Analytics@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Tilman Bayer > Senior Analyst > Wikimedia Foundation > IRC (Freenode): HaeB > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics