IIRC, it's actually desirable to have these PVs in hadoop so we can run the queries in concert with mobile page views.
Erik Z -- thoughts? -Toby On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu <[email protected]> wrote: > Is this a potential solution to Oliver's concern: > > For "real" image views, add an X-Analytics header value of > "real-view=true" to the request itself? > > If that's not feasible, we should look into using statsv for this (not > sure how that works) or having this be a different kafka topic and not > consumed into HDFS. > > On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin <[email protected]> > wrote: > >> I created a card -- modify as desired: >> >> https://trello.com/c/HMgVD4mz >> >> -Toby >> >> On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <[email protected]> >> wrote: >> >>> It turns out that the media viewer (on desktop; don't know about mobile) >>> does a lot of caching so just because an image is loaded from swift, it >>> doesn't mean it is viewed. We'd like to provide more accurate stats to the >>> GLAM folks, so yes, I think this needs to be added eventually. Let's leave >>> it out of scope for now. >>> >>> -Toby >>> >>> On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <[email protected]> >>> wrote: >>> >>>> We want to include these files in the pageview definition? :/. >>>> >>>> My point was more that we should try to avoid traffic-generating >>>> requests that exist solely as a hack for analytics purposes; it's >>>> artificial work for both users and us. If this is the only way of >>>> doing things that's totally fine. >>>> >>>> On 5 February 2015 at 11:38, Toby Negrin <[email protected]> wrote: >>>> > Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based >>>> > solution would be basically doing the same thing as you propose. >>>> > >>>> > Can you please run it past ops (especially the 404 v 204) part? >>>> > >>>> > Oliver -- the issue is that we'd like to figure out a way to provide >>>> > accurate views of the media files; because of client side caching, we >>>> can't >>>> > use the current requests. But your point is a good one -- we'll need >>>> to add >>>> > this to the PV definition. >>>> > >>>> > -Toby >>>> > >>>> > On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <[email protected]> >>>> wrote: >>>> >> >>>> >> A nice theory, but if they appear in the webrequest table (presumably >>>> >> they would, and we're not creating an entirely new set of varnishes >>>> >> for the transmission of dummy images?) they have to be factored in. >>>> >> Again, however, the new definition automatically filters them by >>>> >> checking the webrequest source and MIME type, so this is not a >>>> >> problem, as I originally stated. >>>> >> >>>> >> On 5 February 2015 at 08:10, Erik Zachte <[email protected]> >>>> wrote: >>>> >> > Oliver, this is not about pageviews, but about media file views. >>>> >> > >>>> >> > >>>> >> > >>>> >> > These will be collected and dumped separately, as per >>>> >> > >>>> >> > >>>> https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts >>>> >> > . >>>> >> > >>>> >> > >>>> >> > >>>> >> > Erik >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > From: [email protected] >>>> >> > [mailto:[email protected]] On Behalf Of Nuria >>>> Ruiz >>>> >> > Sent: Wednesday, February 04, 2015 22:28 >>>> >> > To: A mailing list for the Analytics Team at WMF and everybody who >>>> has >>>> >> > an >>>> >> > interest in Wikipedia and analytics. >>>> >> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer >>>> views >>>> >> > >>>> >> > >>>> >> > >>>> >> >>We would add a rule to Vagrant to make sure it does not try to >>>> look up >>>> >> >> such >>>> >> >> requests in Swift but returns a 404 immediately. >>>> >> > >>>> >> > I bet ops would like it a lot better if this is a 204 and it kind >>>> of >>>> >> > makes >>>> >> > sense as it is the code used for beacons and such. Otherwise they >>>> might >>>> >> > get >>>> >> > alarms on 404s increasing. >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes < >>>> [email protected]> >>>> >> > wrote: >>>> >> > >>>> >> > Not really; the new pageviews definition wouldn't include those >>>> files >>>> >> > anyway. It seems silly, thought, be deliberately generating a large >>>> >> > amount of automated noise and client requests for this :/. >>>> >> > >>>> >> > >>>> >> > On 4 February 2015 at 15:00, Gergo Tisza <[email protected]> >>>> wrote: >>>> >> >> Hi all, >>>> >> >> >>>> >> >> Erik Zachte is working on file view stats and is looking for a >>>> way to >>>> >> >> track >>>> >> >> Media Viewer image views (for which there is no 1:1 relation >>>> between >>>> >> >> server >>>> >> >> hits and actual image views); after some back and forth in >>>> >> >> https://phabricator.wikimedia.org/T86914 I proposed the >>>> following hack: >>>> >> >> >>>> >> >> whenever the javascript code in MediaViewer determines that an >>>> image >>>> >> >> view >>>> >> >> happened (e.g. an image has been displayed for a certain amount of >>>> >> >> time), >>>> >> >> it >>>> >> >> makes a request to a certain fake image, say >>>> >> >> >>>> >> >> >>>> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview- >>>> <real >>>> >> >> image name>/<size>px-thumbnail.<ext> . These hits can than be >>>> easily >>>> >> >> filtered from the varnish request logs and added to the normal >>>> >> >> requests. >>>> >> >> We >>>> >> >> would add a rule to Vagrant to make sure it does not try to look >>>> up >>>> >> >> such >>>> >> >> requests in Swift but returns a 404 immediately. >>>> >> >> >>>> >> >> This would be a temporary workaround until there is a proper way >>>> to log >>>> >> >> virtual image views, such as EventLogging with a non-SQL backend. >>>> >> >> >>>> >> >> Do you see any fundamental problem with this? >>>> >> >> >>>> >> > >>>> >> >> _______________________________________________ >>>> >> >> Analytics mailing list >>>> >> >> [email protected] >>>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >> >> >>>> >> > >>>> >> > >>>> >> > >>>> >> > -- >>>> >> > Oliver Keyes >>>> >> > Research Analyst >>>> >> > Wikimedia Foundation >>>> >> > >>>> >> > _______________________________________________ >>>> >> > Analytics mailing list >>>> >> > [email protected] >>>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > _______________________________________________ >>>> >> > Analytics mailing list >>>> >> > [email protected] >>>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >> > >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Oliver Keyes >>>> >> Research Analyst >>>> >> Wikimedia Foundation >>>> >> >>>> >> _______________________________________________ >>>> >> Analytics mailing list >>>> >> [email protected] >>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > Analytics mailing list >>>> > [email protected] >>>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>>> > >>>> >>>> >>>> >>>> -- >>>> Oliver Keyes >>>> Research Analyst >>>> Wikimedia Foundation >>>> >>>> _______________________________________________ >>>> Analytics mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>> >>> >>> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
