Erik, Again: I am not, and at no point in this conversation have been, concerned about the pageview definition.
(Repeat no. 5) On 5 February 2015 at 17:28, Erik Zachte <[email protected]> wrote: > I'm not sure why a beacon would have to be a dummy html file, thus confusing > PV stats. > > Could it not be a dummy image request, more in line with the one pixel > images that are often used. > > This way Oliver can relax, go on vacation for real, without keeping a close > watch over PV definitions. > > > > From: [email protected] > [mailto:[email protected]] On Behalf Of Dan Andreescu > Sent: Thursday, February 05, 2015 22:43 > > > To: A mailing list for the Analytics Team at WMF and everybody who has an > interest in Wikipedia and analytics. > Subject: Re: [Analytics] Virtual file view hack for Media Viewer views > > > > Nuria & Erik: you're totally right, I keep forgetting this problem is more > complicated than I think. > > > > So we should figure out how this statsv magic thing works and see if we can > use it here. > > > > On Thu, Feb 5, 2015 at 4:41 PM, Nuria Ruiz <[email protected]> wrote: > >>[Oliver] My point was more that we should try to avoid traffic-generating > >>[Oliver] requests that exist solely as a hack for analytics purposes; > >>[Dan] Is this a potential solution to Oliver's concern: > > > > I disagree we should be concern about "beacons" to identify preloads, just > like beacons exist for ads or stats using one to identify preloads doesn't > seem far fetched (certainly I have used similar code before and it did its > job). > > > > Note that EL works in a similar fashion requesting a "fake" image to varnish > to which we answer with a 204. It is very similar and the reason why we have > such a code is that we do not have a specific endpoint or domain where > requests of this type could go. Everything requested by our users and > ourselves ends up in varnish pretty much. > > > > > > > > > > > > > > > > > > > > On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu <[email protected]> > wrote: > > Is this a potential solution to Oliver's concern: > > > > For "real" image views, add an X-Analytics header value of "real-view=true" > to the request itself? > > > > If that's not feasible, we should look into using statsv for this (not sure > how that works) or having this be a different kafka topic and not consumed > into HDFS. > > > > On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin <[email protected]> wrote: > > I created a card -- modify as desired: > > > > https://trello.com/c/HMgVD4mz > > > > -Toby > > > > On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <[email protected]> wrote: > > It turns out that the media viewer (on desktop; don't know about mobile) > does a lot of caching so just because an image is loaded from swift, it > doesn't mean it is viewed. We'd like to provide more accurate stats to the > GLAM folks, so yes, I think this needs to be added eventually. Let's leave > it out of scope for now. > > > > -Toby > > > > On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <[email protected]> wrote: > > We want to include these files in the pageview definition? :/. > > My point was more that we should try to avoid traffic-generating > requests that exist solely as a hack for analytics purposes; it's > artificial work for both users and us. If this is the only way of > doing things that's totally fine. > > > On 5 February 2015 at 11:38, Toby Negrin <[email protected]> wrote: >> Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based >> solution would be basically doing the same thing as you propose. >> >> Can you please run it past ops (especially the 404 v 204) part? >> >> Oliver -- the issue is that we'd like to figure out a way to provide >> accurate views of the media files; because of client side caching, we >> can't >> use the current requests. But your point is a good one -- we'll need to >> add >> this to the PV definition. >> >> -Toby >> >> On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <[email protected]> wrote: >>> >>> A nice theory, but if they appear in the webrequest table (presumably >>> they would, and we're not creating an entirely new set of varnishes >>> for the transmission of dummy images?) they have to be factored in. >>> Again, however, the new definition automatically filters them by >>> checking the webrequest source and MIME type, so this is not a >>> problem, as I originally stated. >>> >>> On 5 February 2015 at 08:10, Erik Zachte <[email protected]> wrote: >>> > Oliver, this is not about pageviews, but about media file views. >>> > >>> > >>> > >>> > These will be collected and dumped separately, as per >>> > >>> > >>> > https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts >>> > . >>> > >>> > >>> > >>> > Erik >>> > >>> > >>> > >>> > >>> > >>> > From: [email protected] >>> > [mailto:[email protected]] On Behalf Of Nuria Ruiz >>> > Sent: Wednesday, February 04, 2015 22:28 >>> > To: A mailing list for the Analytics Team at WMF and everybody who has >>> > an >>> > interest in Wikipedia and analytics. >>> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer views >>> > >>> > >>> > >>> >>We would add a rule to Vagrant to make sure it does not try to look up >>> >> such >>> >> requests in Swift but returns a 404 immediately. >>> > >>> > I bet ops would like it a lot better if this is a 204 and it kind of >>> > makes >>> > sense as it is the code used for beacons and such. Otherwise they might >>> > get >>> > alarms on 404s increasing. >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <[email protected]> >>> > wrote: >>> > >>> > Not really; the new pageviews definition wouldn't include those files >>> > anyway. It seems silly, thought, be deliberately generating a large >>> > amount of automated noise and client requests for this :/. >>> > >>> > >>> > On 4 February 2015 at 15:00, Gergo Tisza <[email protected]> wrote: >>> >> Hi all, >>> >> >>> >> Erik Zachte is working on file view stats and is looking for a way to >>> >> track >>> >> Media Viewer image views (for which there is no 1:1 relation between >>> >> server >>> >> hits and actual image views); after some back and forth in >>> >> https://phabricator.wikimedia.org/T86914 I proposed the following >>> >> hack: >>> >> >>> >> whenever the javascript code in MediaViewer determines that an image >>> >> view >>> >> happened (e.g. an image has been displayed for a certain amount of >>> >> time), >>> >> it >>> >> makes a request to a certain fake image, say >>> >> >>> >> >>> >> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real >>> >> image name>/<size>px-thumbnail.<ext> . These hits can than be easily >>> >> filtered from the varnish request logs and added to the normal >>> >> requests. >>> >> We >>> >> would add a rule to Vagrant to make sure it does not try to look up >>> >> such >>> >> requests in Swift but returns a 404 immediately. >>> >> >>> >> This would be a temporary workaround until there is a proper way to >>> >> log >>> >> virtual image views, such as EventLogging with a non-SQL backend. >>> >> >>> >> Do you see any fundamental problem with this? >>> >> >>> > >>> >> _______________________________________________ >>> >> Analytics mailing list >>> >> [email protected] >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >>> > >>> > >>> > >>> > -- >>> > Oliver Keyes >>> > Research Analyst >>> > Wikimedia Foundation >>> > >>> > _______________________________________________ >>> > Analytics mailing list >>> > [email protected] >>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > Analytics mailing list >>> > [email protected] >>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> > >>> >>> >>> >>> -- >>> Oliver Keyes >>> Research Analyst >>> Wikimedia Foundation >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
