(I'd defer to the Readers Web team with Tilman on whether country extracted from the cookie would be sufficient.)
Adding to this, one thing to consider is DNT - is there a way to invoke EL so that such traffic is appropriately imputed or something? -Adam On Thu, Jan 18, 2018 at 2:13 PM, Andrew Otto <[email protected]> wrote: > > In particular, will we be able to sort by country, OS, Browser, etc? > OS, Browser, yes. User Agent parsing is done by the EventLogging > processors. > > Country not quite as easily, as EventLogging does not include client > IP addresses. We could consider putting this back in somehow, or, I’ve > also heard that there is a geocoded country cookie that varnish will set > that the browser could send back as part of the event. Is country enough > geo detail? > > > > On Thu, Jan 18, 2018 at 2:30 PM, Olga Vasileva <[email protected]> > wrote: > >> Hi all, >> >> I just want to confirm that the proposed method using Eventlogging will >> allow us to gather data in a similar fashion to the web request table. In >> particular, will we be able to sort by country, OS, Browser, etc? Our goal >> here is to be able to consider the new page interactions metric on the same >> level and with the same depth as pageviews. >> >> Thanks! >> >> - Olga >> >> On Thu, Jan 18, 2018 at 12:46 PM Andrew Otto <[email protected]> wrote: >> >>> > the beacon puts the record into the webrequest table and from there >>> it would only take some trivial preprocessing >>> ‘Trivial’ preprocessing that has to look through 150K requests per >>> second! This is a lot of work! >>> >>> > tracking of events is better done on an event based system and EL is >>> such a system. >>> I agree with this too. We really want to discourage people from trying >>> to measure things by searching through the huge haystack of all >>> webrequests. To measure something, you should emit an event if you can. >>> If it were practical, I’d prefer that we did this for pageviews as well. >>> Currently, we need a complicated definition of what a pageview is, which >>> really only exists in the Java implementation in the Hadoop cluster. It’d >>> be much clearer if app developers had a way to define themselves what >>> counts as a pageview, and emit that as an event. >>> >>> This should be the approach that people take when they want to measure >>> something new. Emit an event! This event will get its own Kafka topic >>> (you can consume this to do whatever you like with it), and be refined into >>> its own Hive table. >>> >>> > I don’t want to have to create that chart and export one dataset >>> from pageviews and one dataset from eventlogging to do that. >>> If you also design your schema nicely >>> <https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines>, >>> it will be easily importable into Druid and usable in Pivot and Superset, >>> alongside of pageviews. We’re working on getting nice schemas automatically >>> imported into druid <https://gerrit.wikimedia.org/r/#/c/386882/>. >>> >>> >>> >>> >>> On Thu, Jan 18, 2018 at 11:16 AM, Nuria Ruiz <[email protected]> >>> wrote: >>> >>>> Gergo, >>>> >>>> >while EventLogging data gets stored in a different, unrelated way >>>> Not really, This has changed quite a bit as of the last two quarters. >>>> Eventlogging data as of recent gets preprocessed and refined similar to how >>>> webrequest data is preprocessed and refined. You can have a dashboard on >>>> top of some eventlogging schemas on superset in the same way you have a >>>> dashboard that displays pageview data on superset. >>>> >>>> See dashboards on superset (user required). >>>> >>>> https://superset.wikimedia.org/superset/dashboard/7/?presele >>>> ct_filters=%7B%7D >>>> >>>> And (again, user required) EL data on druid, this very same data we are >>>> talking about, page previews: >>>> >>>> https://pivot.wikimedia.org/#tbayer_popups >>>> >>>> >>>> >I was going to make the point that #2 already has a processing >>>> pipeline established whereas #1 doesn't. >>>> This is incorrect, we mark as "preview" data that we want to exclude >>>> from processing, see: >>>> https://github.com/wikimedia/analytics-refinery-source/blob/ >>>> master/refinery-core/src/main/java/org/wikimedia/analytics/ >>>> refinery/core/PageviewDefinition.java#L144 >>>> Naming is unfortunate but previews are really "preloads" as in requests >>>> we make (and cache locally) and maybe shown to users or not. >>>> >>>> >>>> But again, tracking of events is better done on an event based system >>>> and EL is such a system. >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> -- >> Olga Vasileva // Product Manager // Reading Web Team >> https://wikimediafoundation.org/ >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
