This seems perfect. Is it currently used? On 17 August 2015 at 18:03, Andrew Otto <[email protected]> wrote: > BTW, Christian foresaw this issue and wrote this: > https://github.com/wikimedia/analytics-refinery-source/tree/master/guard > > It should be useable for pageviews too, I think. For this issue, a guard > that made sure that outreach.wikimedia.org never appeared would have been an > error. > > > > > >> On Aug 17, 2015, at 14:45, Oliver Keyes <[email protected]> wrote: >> >> On 17 August 2015 at 13:48, Joseph Allemandou <[email protected]> >> wrote: >>> Hey Oliver, >>> >>> The analytics team is responsible for the pageview definition. >>> When finding issues, sending an email to the analytics mailing list is the >>> right thing to do :) >>> >> >> Indeed; my point is not about issues reported upstream. My point is >> that there appears to currently be absolutely no work done to take >> this (org-level, highest possible priority) KPI and evaluate it every >> month or ever N days to make sure that, even with the gradual >> accretion of changes to the input data, it is still extracting what we >> want. It is down to user-reported issues. The problem with this >> approach is that after 90 days it is impossible to rerun the data; if >> there is a bug breaking the logs, and it takes more than 90 days to >> discover it, those logs are simply broken. >> >> In addition, discovering these issues requires a very granular >> understanding of what the pageviews logs are meant to be capturing >> that most customers simply will not have. It worked in this case >> primarily because the customer actually /wrote/ the definition ;p. >> >> For public transparency: Joseph and I talked on IRC and will be >> working on ways to validate data and detect these kinds of regressions >> in advance. >> >>> On our end, we could surely do a better job to communicate changes in the >>> pageview definition code for anybody interested to review/comment/ask for >>> documentation. >>> Emails have been sent regularly about updates on the analytics list, except >>> in the past few month. >>> We shall get back to that good habit and send notifications with >>> explanations of the changes. >>> >>> Joseph >>> >>> >>> >>> >>> On Mon, Aug 17, 2015 at 5:15 PM, Oliver Keyes <[email protected]> wrote: >>>> >>>> You should also note that donate-wiki pageviews are making it into the >>>> counts (again, the definition was designed to exclude these). >>>> >>>> Whose job is it to review pageviews and update the definition when >>>> issues are found? >>>> >>>> On 17 August 2015 at 10:32, Oliver Keyes <[email protected]> wrote: >>>>> Just to clarify; there is no need to ask me before making changes >>>>> (obviously I find my approval for pageviews changes being sought >>>>> incredibly flattering, but I am not the only person involved in this >>>>> project ;p). What I'm more driving towards is directly informing >>>>> customers when the definition is adapted. >>>>> >>>>> On 17 August 2015 at 10:31, Oliver Keyes <[email protected]> wrote: >>>>>> Excellent; thank you. >>>>>> >>>>>> On 17 August 2015 at 04:42, Joseph Allemandou >>>>>> <[email protected]> wrote: >>>>>>> Oliver, >>>>>>> >>>>>>> It was a mistake from me to add the 'outreach' subdomain without >>>>>>> asking you. >>>>>>> >>>>>>> From a documentation perspective, the analytics team uses that place >>>>>>> to >>>>>>> document changes: >>>>>>> https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest and I >>>>>>> didn't >>>>>>> know about up-to-date documentation you sent. >>>>>>> >>>>>>> Tickets have been created to both correct the bug and update the >>>>>>> documentation pages. >>>>>>> >>>>>>> Joseph >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sun, Aug 16, 2015 at 8:47 PM, Oliver Keyes <[email protected]> >>>>>>> wrote: >>>>>>>> >>>>>>>> Ah, I see the problem; someone patched it and never documented it. >>>>>>>> >>>>>>>> We have documentation at >>>>>>>> >>>>>>>> https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters >>>>>>>> of the generalised filters. There is also a log, on >>>>>>>> https://meta.wikimedia.org/wiki/Research:Page_view, of changes to the >>>>>>>> pageview definition. >>>>>>>> >>>>>>>> The intent behind both the transparent definition and the log is to >>>>>>>> ensure that we know what is going /in/ the definition. >>>>>>>> >>>>>>>> In this case, somebody has patched the definition >>>>>>>> >>>>>>>> >>>>>>>> (https://github.com/wikimedia/analytics-refinery-source/commit/cc0b6ed7e4f403eaa82235ec6a0f27152b0c2710) >>>>>>>> to include traffic from outreach.wikimedia.org - a site that was very >>>>>>>> deliberately and very explicitly excluded from the definition as it >>>>>>>> was written. >>>>>>>> >>>>>>>> There is no explanation of why this change was made, there is no >>>>>>>> documentation of this change even existing outside the actual >>>>>>>> Java.... >>>>>>>> can someone please explain what this is for, and update all the >>>>>>>> documentation to reflect that? And then could people be very, very >>>>>>>> clear in future that it is expected there be a log of alterations you >>>>>>>> make to high-level KPIs beyond the, you know, commit logs. >>>>>>>> >>>>>>>> On 16 August 2015 at 14:32, Madhumitha Viswanathan >>>>>>>> <[email protected]> wrote: >>>>>>>>> The new one. >>>>>>>>> >>>>>>>>> The code that generates it - >>>>>>>>> >>>>>>>>> - >>>>>>>>> >>>>>>>>> >>>>>>>>> https://github.com/wikimedia/analytics-refinery/blob/master/hive/pageview/hourly/create_pageview_hourly_table.hql >>>>>>>>> - >>>>>>>>> >>>>>>>>> >>>>>>>>> https://github.com/wikimedia/analytics-refinery/tree/master/oozie/pageview/hourly >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Aug 16, 2015 at 11:01 AM, Oliver Keyes >>>>>>>>> <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Is the pageviews_hourly table meant to contain pageviews according >>>>>>>>>> to >>>>>>>>>> the new or old definition? If old, where can I find aggregates for >>>>>>>>>> the >>>>>>>>>> new one? >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Oliver Keyes >>>>>>>>>> Count Logula >>>>>>>>>> Wikimedia Foundation >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Analytics mailing list >>>>>>>>>> [email protected] >>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> --Madhu :) >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Analytics mailing list >>>>>>>>> [email protected] >>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Oliver Keyes >>>>>>>> Count Logula >>>>>>>> Wikimedia Foundation >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Analytics mailing list >>>>>>>> [email protected] >>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Joseph Allemandou >>>>>>> Data Engineer @ Wikimedia Foundation >>>>>>> IRC: joal >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Analytics mailing list >>>>>>> [email protected] >>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Oliver Keyes >>>>>> Count Logula >>>>>> Wikimedia Foundation >>>>> >>>>> >>>>> >>>>> -- >>>>> Oliver Keyes >>>>> Count Logula >>>>> Wikimedia Foundation >>>> >>>> >>>> >>>> -- >>>> Oliver Keyes >>>> Count Logula >>>> Wikimedia Foundation >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >>> >>> >>> -- >>> Joseph Allemandou >>> Data Engineer @ Wikimedia Foundation >>> IRC: joal >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> >> -- >> Oliver Keyes >> Count Logula >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
