On 17 August 2015 at 13:48, Joseph Allemandou <[email protected]> wrote:
> Hey Oliver,
>
> The analytics team is responsible for the pageview definition.
> When finding issues, sending an email to the analytics mailing list is the
> right thing to do :)
>

Indeed; my point is not about issues reported upstream. My point is
that there appears to currently be absolutely no work done to take
this (org-level, highest possible priority) KPI and evaluate it every
month or ever N days to make sure that, even with the gradual
accretion of changes to the input data, it is still extracting what we
want. It is down to user-reported issues. The problem with this
approach is that after 90 days it is impossible to rerun the data; if
there is a bug breaking the logs, and it takes more than 90 days to
discover it, those logs are simply broken.

In addition, discovering these issues requires a very granular
understanding of what the pageviews logs are meant to be capturing
that most customers simply will not have. It worked in this case
primarily because the customer actually /wrote/ the definition ;p.

For public transparency: Joseph and I talked on IRC and will be
working on ways to validate data and detect these kinds of regressions
in advance.

> On our end, we could surely do a better job to communicate changes in the
> pageview definition code for anybody interested to review/comment/ask for
> documentation.
> Emails have been sent regularly about updates on the analytics list, except
> in the past few month.
> We shall get back to that good habit and send notifications with
> explanations of the changes.
>
> Joseph
>
>
>
>
> On Mon, Aug 17, 2015 at 5:15 PM, Oliver Keyes <[email protected]> wrote:
>>
>> You should also note that donate-wiki pageviews are making it into the
>> counts (again, the definition was designed to exclude these).
>>
>> Whose job is it to review pageviews and update the definition when
>> issues are found?
>>
>> On 17 August 2015 at 10:32, Oliver Keyes <[email protected]> wrote:
>> > Just to clarify; there is no need to ask me before making changes
>> > (obviously I find my approval for pageviews changes being sought
>> > incredibly flattering, but I am not the only person involved in this
>> > project ;p). What I'm more driving towards is directly informing
>> > customers when the definition is adapted.
>> >
>> > On 17 August 2015 at 10:31, Oliver Keyes <[email protected]> wrote:
>> >> Excellent; thank you.
>> >>
>> >> On 17 August 2015 at 04:42, Joseph Allemandou
>> >> <[email protected]> wrote:
>> >>> Oliver,
>> >>>
>> >>> It was a mistake from me to add the 'outreach' subdomain without
>> >>> asking you.
>> >>>
>> >>> From a documentation perspective, the analytics team uses that place
>> >>> to
>> >>> document changes:
>> >>> https://wikitech.wikimedia.org/wiki/Analytics/Data/Webrequest and I
>> >>> didn't
>> >>> know about up-to-date documentation you sent.
>> >>>
>> >>> Tickets have been created to both correct the bug and update the
>> >>> documentation pages.
>> >>>
>> >>> Joseph
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Aug 16, 2015 at 8:47 PM, Oliver Keyes <[email protected]>
>> >>> wrote:
>> >>>>
>> >>>> Ah, I see the problem; someone patched it and never documented it.
>> >>>>
>> >>>> We have documentation at
>> >>>>
>> >>>> https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
>> >>>> of the generalised filters. There is also a log, on
>> >>>> https://meta.wikimedia.org/wiki/Research:Page_view, of changes to the
>> >>>> pageview definition.
>> >>>>
>> >>>> The intent behind both the transparent definition and the log is to
>> >>>> ensure that we know what is going /in/ the definition.
>> >>>>
>> >>>> In this case, somebody has patched the definition
>> >>>>
>> >>>>
>> >>>> (https://github.com/wikimedia/analytics-refinery-source/commit/cc0b6ed7e4f403eaa82235ec6a0f27152b0c2710)
>> >>>> to include traffic from outreach.wikimedia.org - a site that was very
>> >>>> deliberately and very explicitly excluded from the definition as it
>> >>>> was written.
>> >>>>
>> >>>> There is no explanation of why this change was made, there is no
>> >>>> documentation of this change even existing outside the actual
>> >>>> Java....
>> >>>> can someone please explain what this is for, and update all the
>> >>>> documentation to reflect that? And then could people be very, very
>> >>>> clear in future that it is expected there be a log of alterations you
>> >>>> make to high-level KPIs beyond the, you know, commit logs.
>> >>>>
>> >>>> On 16 August 2015 at 14:32, Madhumitha Viswanathan
>> >>>> <[email protected]> wrote:
>> >>>> > The new one.
>> >>>> >
>> >>>> > The code that generates it -
>> >>>> >
>> >>>> > -
>> >>>> >
>> >>>> >
>> >>>> > https://github.com/wikimedia/analytics-refinery/blob/master/hive/pageview/hourly/create_pageview_hourly_table.hql
>> >>>> > -
>> >>>> >
>> >>>> >
>> >>>> > https://github.com/wikimedia/analytics-refinery/tree/master/oozie/pageview/hourly
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > On Sun, Aug 16, 2015 at 11:01 AM, Oliver Keyes
>> >>>> > <[email protected]>
>> >>>> > wrote:
>> >>>> >>
>> >>>> >> Is the pageviews_hourly table meant to contain pageviews according
>> >>>> >> to
>> >>>> >> the new or old definition? If old, where can I find aggregates for
>> >>>> >> the
>> >>>> >> new one?
>> >>>> >>
>> >>>> >> --
>> >>>> >> Oliver Keyes
>> >>>> >> Count Logula
>> >>>> >> Wikimedia Foundation
>> >>>> >>
>> >>>> >> _______________________________________________
>> >>>> >> Analytics mailing list
>> >>>> >> [email protected]
>> >>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > --Madhu :)
>> >>>> >
>> >>>> > _______________________________________________
>> >>>> > Analytics mailing list
>> >>>> > [email protected]
>> >>>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>> >
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Oliver Keyes
>> >>>> Count Logula
>> >>>> Wikimedia Foundation
>> >>>>
>> >>>> _______________________________________________
>> >>>> Analytics mailing list
>> >>>> [email protected]
>> >>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Joseph Allemandou
>> >>> Data Engineer @ Wikimedia Foundation
>> >>> IRC: joal
>> >>>
>> >>> _______________________________________________
>> >>> Analytics mailing list
>> >>> [email protected]
>> >>> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Oliver Keyes
>> >> Count Logula
>> >> Wikimedia Foundation
>> >
>> >
>> >
>> > --
>> > Oliver Keyes
>> > Count Logula
>> > Wikimedia Foundation
>>
>>
>>
>> --
>> Oliver Keyes
>> Count Logula
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
>
> --
> Joseph Allemandou
> Data Engineer @ Wikimedia Foundation
> IRC: joal
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to