>Also, just out of curiosity and to better understand the issue, what
>would be an example of a real life request URL that results in such a
>"no page title found" error when extracting the title?
Special page requests, for example.

Normally pages like "Special:Blah" are "actions" not pages themselves. We
do not count those as pageviews with the notably exception of Search
requests (as they do provide content). So a page like "Special:Search:
Blah-Blah" would be an example of a pageview with title "-" on
pageview_hourly table.



On Mon, Dec 5, 2016 at 3:15 PM, Tilman Bayer <tba...@wikimedia.org> wrote:

> On Mon, Nov 14, 2016 at 12:25 PM, Nuria Ruiz <nu...@wikimedia.org> wrote:
> > This is documented now here:
> >
> > https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Gotchas
> Thanks for the documentation. Does this only affect data provided by
> the API, or also the page_title
> field in the pageview_hourly table, i.e. the source of the API data?
>
> In the latter case, please also add a note to the "known problems" at
> https://wikitech.wikimedia.org/wiki/Analytics/Data/Pageview_hourly .
> (This is the canonical place for documenting such issues - thanks for
> making this explicit at
> https://wikitech.wikimedia.org/wiki/Analytics/PageviewAPI#Issues_with_data
> .
> Separately, for pageview definition changes there is also
> https://meta.wikimedia.org/wiki/Research:Page_view#Change_log . No
> objections of course if the Analytics team commits to keeping the
> information up to date in all three places.)
>
> Also, just out of curiosity and to better understand the issue, what
> would be an example of a real life request URL that results in such a
> "no page title found" error when extracting the title?
> >
> > On Tue, Nov 8, 2016 at 7:25 AM, Vipul Naik <vipulna...@gmail.com> wrote:
> >>
> >> Hi Joseph,
> >>
> >> Thanks for the clarification.
> >>
> >> Any ideas why this number is much higher for some months? In particular,
> >> on desktop, it's high in the months of July to September 2015 (around 10
> >> million, compared to the usual 5 million) and then high again in October
> >> 2016 (45 million, about 10x the usual value).
> For context , https://en.wikipedia.org/wiki/- was the 8th most viewed
> page on all projects from May to October 2015, see footnote [1] at
> https://phabricator.wikimedia.org/T117945 (that bug, flagged as "High"
> Analytics priority since almost a year, is about a separate but
> similar issue)
>
> >>
> >> Data is from
> >> http://wikipediaviews.org/displayviewsformultiplemonths.
> php?page=-&allmonths=allmonths&drilldown=all
> >> which summarizes results from the Wikimedia API (and stats.grok.se for
> data
> >> before July 2015).
> >>
> >> Vipul
> >>
> >> On Tue, Nov 8, 2016 at 3:46 AM, Joseph Allemandou
> >> <jalleman...@wikimedia.org> wrote:
> >>>
> >>> Hello Issa,
> >>>
> >>> Thank you for your question.
> >>> The very high number of views of the "-" page is explained by this dash
> >>> value being used as a special value for "no page title found" when
> >>> extracting titles from urls.
> >>> We definitely should document this in the API, creating this task:
> >>> https://phabricator.wikimedia.org/T150249
> >>> Best
> >>> Joseph
> >>>
> >>>
> >>> On Tue, Nov 8, 2016 at 12:28 AM, Issa Rice <ricei...@gmail.com> wrote:
> >>>>
> >>>> Dear Analytics Mailing List,
> >>>>
> >>>> Recently while querying pageviews of various pages, I discovered that
> >>>> the page whose title is a single hyphen character (i.e. with the title
> >>>> "-", with URL <https://en.wikipedia.org/wiki/->, which redirects to
> >>>> <https://en.wikipedia.org/wiki/Hyphen-minus>) receives an unusually
> high
> >>>> number of pageviews under the Pageview API. Taking October 2015 as an
> >>>> example, the page received 5.4 million pageviews during that month
> >>>> according to the API:
> >>>>
> >>>> <https://wikimedia.org/api/rest_v1/metrics/pageviews/per-
> article/en.wikipedia/desktop/user/-/daily/20151001/20151031>.
> >>>>
> >>>> However, according the stats.grok.se (which was still operational in
> the
> >>>> same month), the page received only 1209 pageviews:
> >>>> <http://stats.grok.se/en/201510/->.
> >>>>
> >>>> Looking at the tabulation of pageviews on Wikipedia Views, the
> increase
> >>>> in pageviews for this page coincides with the change to the Pageview
> >>>> API in July 2015:
> >>>>
> >>>> <http://wikipediaviews.org/displayviewsformultiplemonths.
> php?page=-&allmonths=allmonths&drilldown=all>.
> >>>>
> >>>> As I understand, page titles must be URL-encoded before the query,
> >>>> but the URL-encoding of "-" is itself.
> >>>>
> >>>> I looked at the API documentation but did not see this behavior
> listed,
> >>>> so I am wondering where these numbers are coming from.
> >>>>
> >>>> Best regards,
> >>>> Issa
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Analytics mailing list
> >>>> Analytics@lists.wikimedia.org
> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Joseph Allemandou
> >>> Data Engineer @ Wikimedia Foundation
> >>> IRC: joal
> >>>
> >>> _______________________________________________
> >>> Analytics mailing list
> >>> Analytics@lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>>
> >>
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> Analytics@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >>
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Tilman Bayer
> Senior Analyst
> Wikimedia Foundation
> IRC (Freenode): HaeB
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to