Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

Nuria Ruiz Thu, 18 Jan 2018 14:58:21 -0800

>Adding to this, one thing to consider is DNT - is there a way to invoke EL
so that such traffic is appropriately imputed or something?


I am not sure what you are asking ...

On Thu, Jan 18, 2018 at 1:57 PM, Adam Baso <[email protected]> wrote:

> (I'd defer to the Readers Web team with Tilman on whether country
> extracted from the cookie would be sufficient.)
>
> Adding to this, one thing to consider is DNT - is there a way to invoke EL
> so that such traffic is appropriately imputed or something?
>
> -Adam
>
> On Thu, Jan 18, 2018 at 2:13 PM, Andrew Otto <[email protected]> wrote:
>
>> >  In particular, will we be able to sort by country, OS, Browser, etc?
>> OS, Browser, yes.  User Agent parsing is done by the EventLogging
>> processors.
>>
>> Country not quite as easily, as EventLogging does not include client
>> IP addresses.  We could consider putting this back in somehow, or, I’ve
>> also heard that there is a geocoded country cookie that varnish will set
>> that the browser could send back as part of the event.  Is country enough
>> geo detail?
>>
>>
>>
>> On Thu, Jan 18, 2018 at 2:30 PM, Olga Vasileva <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> I just want to confirm that the proposed method using Eventlogging will
>>> allow us to gather data in a similar fashion to the web request table.  In
>>> particular, will we be able to sort by country, OS, Browser, etc?  Our goal
>>> here is to be able to consider the new page interactions metric on the same
>>> level and with the same depth as pageviews.
>>>
>>> Thanks!
>>>
>>> - Olga
>>>
>>> On Thu, Jan 18, 2018 at 12:46 PM Andrew Otto <[email protected]> wrote:
>>>
>>>> > the beacon puts the record into the webrequest table and from there
>>>> it would only take some trivial preprocessing
>>>> ‘Trivial’ preprocessing that has to look through 150K requests per
>>>> second! This is a lot of work!
>>>>
>>>> > tracking of events is better done on an event based system and EL is
>>>> such a system.
>>>> I agree with this too.  We really want to discourage people from trying
>>>> to measure things by searching through the huge haystack of all
>>>> webrequests.  To measure something, you should emit an event if you can.
>>>> If it were practical, I’d prefer that we did this for pageviews as well.
>>>> Currently, we need a complicated definition of what a pageview is, which
>>>> really only exists in the Java implementation in the Hadoop cluster.  It’d
>>>> be much clearer if app developers had a way to define themselves what
>>>> counts as a pageview, and emit that as an event.
>>>>
>>>> This should be the approach that people take when they want to measure
>>>> something new.  Emit an event!  This event will get its own Kafka topic
>>>> (you can consume this to do whatever you like with it), and be refined into
>>>> its own Hive table.
>>>>
>>>> >  I don’t want to have to create that chart and export one dataset
>>>> from pageviews and one dataset from eventlogging to do that.
>>>>  If you also design your schema nicely
>>>> <https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines>,
>>>> it will be easily importable into Druid and usable in Pivot and Superset,
>>>> alongside of pageviews.  We’re working on getting nice schemas 
>>>> automatically
>>>> imported into druid <https://gerrit.wikimedia.org/r/#/c/386882/>.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jan 18, 2018 at 11:16 AM, Nuria Ruiz <[email protected]>
>>>> wrote:
>>>>
>>>>> Gergo,
>>>>>
>>>>> >while EventLogging data gets stored in a different, unrelated way
>>>>> Not really, This has changed quite a bit as of the last two quarters.
>>>>> Eventlogging data as of recent gets preprocessed and refined similar to 
>>>>> how
>>>>> webrequest data is preprocessed and refined. You can have a dashboard on
>>>>> top of some eventlogging schemas on superset in the same way you have a
>>>>> dashboard that displays pageview data on superset.
>>>>>
>>>>> See dashboards on superset (user required).
>>>>>
>>>>> https://superset.wikimedia.org/superset/dashboard/7/?presele
>>>>> ct_filters=%7B%7D
>>>>>
>>>>> And (again, user required) EL data on druid, this very same data we
>>>>> are talking about, page previews:
>>>>>
>>>>> https://pivot.wikimedia.org/#tbayer_popups
>>>>>
>>>>>
>>>>> >I was going to make the point that #2 already has a processing
>>>>> pipeline established whereas #1 doesn't.
>>>>> This is incorrect, we mark as "preview" data that we want to exclude
>>>>> from processing, see:
>>>>> https://github.com/wikimedia/analytics-refinery-source/blob/
>>>>> master/refinery-core/src/main/java/org/wikimedia/analytics/r
>>>>> efinery/core/PageviewDefinition.java#L144
>>>>> Naming is unfortunate but previews are really "preloads" as in
>>>>> requests we make (and cache locally) and maybe shown to users or not.
>>>>>
>>>>>
>>>>> But again, tracking of events is better done on an event based system
>>>>> and EL is such a system.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> [email protected]
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> [email protected]
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>
>>>
>>> --
>>> Olga Vasileva // Product Manager // Reading Web Team
>>> https://wikimediafoundation.org/
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] [Ops] How best to accurately record page interactions in Page Previews

Reply via email to