>I'm not sure why a beacon would have to be a dummy html file, thus confusing PV stats.
>Could it not be a dummy image request, more in line with the one pixel images that are often used. Right. I agree a dummy image makes more sense. On Thu, Feb 5, 2015 at 2:28 PM, Erik Zachte <[email protected]> wrote: > I'm not sure why a beacon would have to be a dummy html file, thus > confusing PV stats. > > Could it not be a dummy image request, more in line with the one pixel > images that are often used. > > This way Oliver can relax, go on vacation for real, without keeping a > close watch over PV definitions. > > > > *From:* [email protected] [mailto: > [email protected]] *On Behalf Of *Dan Andreescu > *Sent:* Thursday, February 05, 2015 22:43 > > *To:* A mailing list for the Analytics Team at WMF and everybody who has > an interest in Wikipedia and analytics. > *Subject:* Re: [Analytics] Virtual file view hack for Media Viewer views > > > > Nuria & Erik: you're totally right, I keep forgetting this problem is more > complicated than I think. > > > > So we should figure out how this statsv magic thing works and see if we > can use it here. > > > > On Thu, Feb 5, 2015 at 4:41 PM, Nuria Ruiz <[email protected]> wrote: > > >[Oliver] My point was more that we should try to avoid traffic-generating > > >[Oliver] requests that exist solely as a hack for analytics purposes; > > >[Dan] Is this a potential solution to Oliver's concern: > > > > I disagree we should be concern about "beacons" to identify preloads, just > like beacons exist for ads or stats using one to identify preloads doesn't > seem far fetched (certainly I have used similar code before and it did its > job). > > > > Note that EL works in a similar fashion requesting a "fake" image to > varnish to which we answer with a 204. It is very similar and the reason > why we have such a code is that we do not have a specific endpoint or > domain where requests of this type could go. Everything requested by our > users and ourselves ends up in varnish pretty much. > > > > > > > > > > > > > > > > > > > > On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu <[email protected]> > wrote: > > Is this a potential solution to Oliver's concern: > > > > For "real" image views, add an X-Analytics header value of > "real-view=true" to the request itself? > > > > If that's not feasible, we should look into using statsv for this (not > sure how that works) or having this be a different kafka topic and not > consumed into HDFS. > > > > On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin <[email protected]> > wrote: > > I created a card -- modify as desired: > > > > https://trello.com/c/HMgVD4mz > > > > -Toby > > > > On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <[email protected]> wrote: > > It turns out that the media viewer (on desktop; don't know about mobile) > does a lot of caching so just because an image is loaded from swift, it > doesn't mean it is viewed. We'd like to provide more accurate stats to the > GLAM folks, so yes, I think this needs to be added eventually. Let's leave > it out of scope for now. > > > > -Toby > > > > On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <[email protected]> wrote: > > We want to include these files in the pageview definition? :/. > > My point was more that we should try to avoid traffic-generating > requests that exist solely as a hack for analytics purposes; it's > artificial work for both users and us. If this is the only way of > doing things that's totally fine. > > > On 5 February 2015 at 11:38, Toby Negrin <[email protected]> wrote: > > Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based > > solution would be basically doing the same thing as you propose. > > > > Can you please run it past ops (especially the 404 v 204) part? > > > > Oliver -- the issue is that we'd like to figure out a way to provide > > accurate views of the media files; because of client side caching, we > can't > > use the current requests. But your point is a good one -- we'll need to > add > > this to the PV definition. > > > > -Toby > > > > On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <[email protected]> > wrote: > >> > >> A nice theory, but if they appear in the webrequest table (presumably > >> they would, and we're not creating an entirely new set of varnishes > >> for the transmission of dummy images?) they have to be factored in. > >> Again, however, the new definition automatically filters them by > >> checking the webrequest source and MIME type, so this is not a > >> problem, as I originally stated. > >> > >> On 5 February 2015 at 08:10, Erik Zachte <[email protected]> wrote: > >> > Oliver, this is not about pageviews, but about media file views. > >> > > >> > > >> > > >> > These will be collected and dumped separately, as per > >> > > >> > > https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts > >> > . > >> > > >> > > >> > > >> > Erik > >> > > >> > > >> > > >> > > >> > > >> > From: [email protected] > >> > [mailto:[email protected]] On Behalf Of Nuria > Ruiz > >> > Sent: Wednesday, February 04, 2015 22:28 > >> > To: A mailing list for the Analytics Team at WMF and everybody who has > >> > an > >> > interest in Wikipedia and analytics. > >> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer views > >> > > >> > > >> > > >> >>We would add a rule to Vagrant to make sure it does not try to look up > >> >> such > >> >> requests in Swift but returns a 404 immediately. > >> > > >> > I bet ops would like it a lot better if this is a 204 and it kind of > >> > makes > >> > sense as it is the code used for beacons and such. Otherwise they > might > >> > get > >> > alarms on 404s increasing. > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <[email protected]> > >> > wrote: > >> > > >> > Not really; the new pageviews definition wouldn't include those files > >> > anyway. It seems silly, thought, be deliberately generating a large > >> > amount of automated noise and client requests for this :/. > >> > > >> > > >> > On 4 February 2015 at 15:00, Gergo Tisza <[email protected]> > wrote: > >> >> Hi all, > >> >> > >> >> Erik Zachte is working on file view stats and is looking for a way to > >> >> track > >> >> Media Viewer image views (for which there is no 1:1 relation between > >> >> server > >> >> hits and actual image views); after some back and forth in > >> >> https://phabricator.wikimedia.org/T86914 I proposed the following > hack: > >> >> > >> >> whenever the javascript code in MediaViewer determines that an image > >> >> view > >> >> happened (e.g. an image has been displayed for a certain amount of > >> >> time), > >> >> it > >> >> makes a request to a certain fake image, say > >> >> > >> >> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview- > <real > >> >> image name>/<size>px-thumbnail.<ext> . These hits can than be easily > >> >> filtered from the varnish request logs and added to the normal > >> >> requests. > >> >> We > >> >> would add a rule to Vagrant to make sure it does not try to look up > >> >> such > >> >> requests in Swift but returns a 404 immediately. > >> >> > >> >> This would be a temporary workaround until there is a proper way to > log > >> >> virtual image views, such as EventLogging with a non-SQL backend. > >> >> > >> >> Do you see any fundamental problem with this? > >> >> > >> > > >> >> _______________________________________________ > >> >> Analytics mailing list > >> >> [email protected] > >> >> https://lists.wikimedia.org/mailman/listinfo/analytics > >> >> > >> > > >> > > >> > > >> > -- > >> > Oliver Keyes > >> > Research Analyst > >> > Wikimedia Foundation > >> > > >> > _______________________________________________ > >> > Analytics mailing list > >> > [email protected] > >> > https://lists.wikimedia.org/mailman/listinfo/analytics > >> > > >> > > >> > > >> > > >> > _______________________________________________ > >> > Analytics mailing list > >> > [email protected] > >> > https://lists.wikimedia.org/mailman/listinfo/analytics > >> > > >> > >> > >> > >> -- > >> Oliver Keyes > >> Research Analyst > >> Wikimedia Foundation > >> > >> _______________________________________________ > >> Analytics mailing list > >> [email protected] > >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
