On 17 August 2015 at 13:48, Joseph Allemandou <[email protected]> wrote: > Hey Oliver, > > The analytics team is responsible for the pageview definition. > When finding issues, sending an email to the analytics mailing list is the > right thing to do :) >
Indeed; my point is not about issues reported upstream. My point is that there appears to currently be absolutely no work done to take this (org-level, highest possible priority) KPI and evaluate it every month or ever N days to make sure that, even with the gradual accretion of changes to the input data, it is still extracting what we want. It is down to user-reported issues. The problem with this approach is that after 90 days it is impossible to rerun the data; if there is a bug breaking the logs, and it takes more than 90 days to discover it, those logs are simply broken. In addition, discovering these issues requires a very granular understanding of what the pageviews logs are meant to be capturing that most customers simply will not have. It worked in this case primarily because the customer actually /wrote/ the definition ;p. For public transparency: Joseph and I talked on IRC and will be working on ways to validate data and detect these kinds of regressions in advance. > On our end, we could surely do a better job to communicate changes in the > pageview definition code for anybody interested to review/comment/ask for > documentation. > Emails have been sent regularly about updates on the analytics list, except > in the past few month. > We shall get back to that good habit and send notifications with > explanations of the changes. > > Joseph > > > > > On Mon, Aug 17, 2015 at 5:15 PM, Oliver Keyes <[email protected]> wrote: >> >> You should also note that donate-wiki pageviews are making it into the >> counts (again, the definition was designed to exclude these). >> >> Whose job is it to review pageviews and update the definition when >> issues are found? >> >> On 17 August 2015 at 10:32, Oliver Keyes <[email protected]> wrote: >> > Just to clarify; there is no need to ask me before making changes >> > (obviously I find my approval for pageviews changes being sought >> > incredibly flattering, but I am not the only person involved in this >> > project ;p). What I'm more driving towards is directly informing >> > customers when the definition is adapted. >> > >> > On 17 August 2015 at 10:31, Oliver Keyes <[email protected]> wrote: >> >> Excellent; thank you. >> >> >> >> On 17 August 2015 at 04:42, Joseph Allemandou >> >> <[email protected]> wrote: >> >>> Oliver, >> >>> >> >>> It was a mistake from me to add the 'outreach' subdomain without >> >>> asking you. >> >>> >> >>> From a documentation perspective, the analytics team uses that place >> >>> to >> >>> document changes: >> >>> https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest and I >> >>> didn't >> >>> know about up-to-date documentation you sent. >> >>> >> >>> Tickets have been created to both correct the bug and update the >> >>> documentation pages. >> >>> >> >>> Joseph >> >>> >> >>> >> >>> >> >>> On Sun, Aug 16, 2015 at 8:47 PM, Oliver Keyes <[email protected]> >> >>> wrote: >> >>>> >> >>>> Ah, I see the problem; someone patched it and never documented it. >> >>>> >> >>>> We have documentation at >> >>>> >> >>>> https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters >> >>>> of the generalised filters. There is also a log, on >> >>>> https://meta.wikimedia.org/wiki/Research:Page_view, of changes to the >> >>>> pageview definition. >> >>>> >> >>>> The intent behind both the transparent definition and the log is to >> >>>> ensure that we know what is going /in/ the definition. >> >>>> >> >>>> In this case, somebody has patched the definition >> >>>> >> >>>> >> >>>> (https://github.com/wikimedia/analytics-refinery-source/commit/cc0b6ed7e4f403eaa82235ec6a0f27152b0c2710) >> >>>> to include traffic from outreach.wikimedia.org - a site that was very >> >>>> deliberately and very explicitly excluded from the definition as it >> >>>> was written. >> >>>> >> >>>> There is no explanation of why this change was made, there is no >> >>>> documentation of this change even existing outside the actual >> >>>> Java.... >> >>>> can someone please explain what this is for, and update all the >> >>>> documentation to reflect that? And then could people be very, very >> >>>> clear in future that it is expected there be a log of alterations you >> >>>> make to high-level KPIs beyond the, you know, commit logs. >> >>>> >> >>>> On 16 August 2015 at 14:32, Madhumitha Viswanathan >> >>>> <[email protected]> wrote: >> >>>> > The new one. >> >>>> > >> >>>> > The code that generates it - >> >>>> > >> >>>> > - >> >>>> > >> >>>> > >> >>>> > https://github.com/wikimedia/analytics-refinery/blob/master/hive/pageview/hourly/create_pageview_hourly_table.hql >> >>>> > - >> >>>> > >> >>>> > >> >>>> > https://github.com/wikimedia/analytics-refinery/tree/master/oozie/pageview/hourly >> >>>> > >> >>>> > >> >>>> > >> >>>> > On Sun, Aug 16, 2015 at 11:01 AM, Oliver Keyes >> >>>> > <[email protected]> >> >>>> > wrote: >> >>>> >> >> >>>> >> Is the pageviews_hourly table meant to contain pageviews according >> >>>> >> to >> >>>> >> the new or old definition? If old, where can I find aggregates for >> >>>> >> the >> >>>> >> new one? >> >>>> >> >> >>>> >> -- >> >>>> >> Oliver Keyes >> >>>> >> Count Logula >> >>>> >> Wikimedia Foundation >> >>>> >> >> >>>> >> _______________________________________________ >> >>>> >> Analytics mailing list >> >>>> >> [email protected] >> >>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > -- >> >>>> > --Madhu :) >> >>>> > >> >>>> > _______________________________________________ >> >>>> > Analytics mailing list >> >>>> > [email protected] >> >>>> > https://lists.wikimedia.org/mailman/listinfo/analytics >> >>>> > >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Oliver Keyes >> >>>> Count Logula >> >>>> Wikimedia Foundation >> >>>> >> >>>> _______________________________________________ >> >>>> Analytics mailing list >> >>>> [email protected] >> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Joseph Allemandou >> >>> Data Engineer @ Wikimedia Foundation >> >>> IRC: joal >> >>> >> >>> _______________________________________________ >> >>> Analytics mailing list >> >>> [email protected] >> >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >>> >> >> >> >> >> >> >> >> -- >> >> Oliver Keyes >> >> Count Logula >> >> Wikimedia Foundation >> > >> > >> > >> > -- >> > Oliver Keyes >> > Count Logula >> > Wikimedia Foundation >> >> >> >> -- >> Oliver Keyes >> Count Logula >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > -- > Joseph Allemandou > Data Engineer @ Wikimedia Foundation > IRC: joal > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Count Logula Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
