Nuria & Erik: you're totally right, I keep forgetting this problem is more complicated than I think.
So we should figure out how this statsv magic thing works and see if we can use it here. On Thu, Feb 5, 2015 at 4:41 PM, Nuria Ruiz <[email protected]> wrote: > >[Oliver] My point was more that we should try to avoid traffic-generating > >[Oliver] requests that exist solely as a hack for analytics purposes; > >[Dan] Is this a potential solution to Oliver's concern: > > I disagree we should be concern about "beacons" to identify preloads, just > like beacons exist for ads or stats using one to identify preloads doesn't > seem far fetched (certainly I have used similar code before and it did its > job). > > Note that EL works in a similar fashion requesting a "fake" image to > varnish to which we answer with a 204. It is very similar and the reason > why we have such a code is that we do not have a specific endpoint or > domain where requests of this type could go. Everything requested by our > users and ourselves ends up in varnish pretty much. > > > > > > > > > > On Thu, Feb 5, 2015 at 12:46 PM, Dan Andreescu <[email protected]> > wrote: > >> Is this a potential solution to Oliver's concern: >> >> For "real" image views, add an X-Analytics header value of >> "real-view=true" to the request itself? >> >> If that's not feasible, we should look into using statsv for this (not >> sure how that works) or having this be a different kafka topic and not >> consumed into HDFS. >> >> On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin <[email protected]> >> wrote: >> >>> I created a card -- modify as desired: >>> >>> https://trello.com/c/HMgVD4mz >>> >>> -Toby >>> >>> On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <[email protected]> >>> wrote: >>> >>>> It turns out that the media viewer (on desktop; don't know about >>>> mobile) does a lot of caching so just because an image is loaded from >>>> swift, it doesn't mean it is viewed. We'd like to provide more accurate >>>> stats to the GLAM folks, so yes, I think this needs to be added eventually. >>>> Let's leave it out of scope for now. >>>> >>>> -Toby >>>> >>>> On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <[email protected]> >>>> wrote: >>>> >>>>> We want to include these files in the pageview definition? :/. >>>>> >>>>> My point was more that we should try to avoid traffic-generating >>>>> requests that exist solely as a hack for analytics purposes; it's >>>>> artificial work for both users and us. If this is the only way of >>>>> doing things that's totally fine. >>>>> >>>>> On 5 February 2015 at 11:38, Toby Negrin <[email protected]> >>>>> wrote: >>>>> > Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop >>>>> based >>>>> > solution would be basically doing the same thing as you propose. >>>>> > >>>>> > Can you please run it past ops (especially the 404 v 204) part? >>>>> > >>>>> > Oliver -- the issue is that we'd like to figure out a way to provide >>>>> > accurate views of the media files; because of client side caching, >>>>> we can't >>>>> > use the current requests. But your point is a good one -- we'll need >>>>> to add >>>>> > this to the PV definition. >>>>> > >>>>> > -Toby >>>>> > >>>>> > On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <[email protected]> >>>>> wrote: >>>>> >> >>>>> >> A nice theory, but if they appear in the webrequest table >>>>> (presumably >>>>> >> they would, and we're not creating an entirely new set of varnishes >>>>> >> for the transmission of dummy images?) they have to be factored in. >>>>> >> Again, however, the new definition automatically filters them by >>>>> >> checking the webrequest source and MIME type, so this is not a >>>>> >> problem, as I originally stated. >>>>> >> >>>>> >> On 5 February 2015 at 08:10, Erik Zachte <[email protected]> >>>>> wrote: >>>>> >> > Oliver, this is not about pageviews, but about media file views. >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > These will be collected and dumped separately, as per >>>>> >> > >>>>> >> > >>>>> https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts >>>>> >> > . >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > Erik >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > From: [email protected] >>>>> >> > [mailto:[email protected]] On Behalf Of >>>>> Nuria Ruiz >>>>> >> > Sent: Wednesday, February 04, 2015 22:28 >>>>> >> > To: A mailing list for the Analytics Team at WMF and everybody >>>>> who has >>>>> >> > an >>>>> >> > interest in Wikipedia and analytics. >>>>> >> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer >>>>> views >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> >>We would add a rule to Vagrant to make sure it does not try to >>>>> look up >>>>> >> >> such >>>>> >> >> requests in Swift but returns a 404 immediately. >>>>> >> > >>>>> >> > I bet ops would like it a lot better if this is a 204 and it kind >>>>> of >>>>> >> > makes >>>>> >> > sense as it is the code used for beacons and such. Otherwise they >>>>> might >>>>> >> > get >>>>> >> > alarms on 404s increasing. >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes < >>>>> [email protected]> >>>>> >> > wrote: >>>>> >> > >>>>> >> > Not really; the new pageviews definition wouldn't include those >>>>> files >>>>> >> > anyway. It seems silly, thought, be deliberately generating a >>>>> large >>>>> >> > amount of automated noise and client requests for this :/. >>>>> >> > >>>>> >> > >>>>> >> > On 4 February 2015 at 15:00, Gergo Tisza <[email protected]> >>>>> wrote: >>>>> >> >> Hi all, >>>>> >> >> >>>>> >> >> Erik Zachte is working on file view stats and is looking for a >>>>> way to >>>>> >> >> track >>>>> >> >> Media Viewer image views (for which there is no 1:1 relation >>>>> between >>>>> >> >> server >>>>> >> >> hits and actual image views); after some back and forth in >>>>> >> >> https://phabricator.wikimedia.org/T86914 I proposed the >>>>> following hack: >>>>> >> >> >>>>> >> >> whenever the javascript code in MediaViewer determines that an >>>>> image >>>>> >> >> view >>>>> >> >> happened (e.g. an image has been displayed for a certain amount >>>>> of >>>>> >> >> time), >>>>> >> >> it >>>>> >> >> makes a request to a certain fake image, say >>>>> >> >> >>>>> >> >> >>>>> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview- >>>>> <real >>>>> >> >> image name>/<size>px-thumbnail.<ext> . These hits can than be >>>>> easily >>>>> >> >> filtered from the varnish request logs and added to the normal >>>>> >> >> requests. >>>>> >> >> We >>>>> >> >> would add a rule to Vagrant to make sure it does not try to look >>>>> up >>>>> >> >> such >>>>> >> >> requests in Swift but returns a 404 immediately. >>>>> >> >> >>>>> >> >> This would be a temporary workaround until there is a proper way >>>>> to log >>>>> >> >> virtual image views, such as EventLogging with a non-SQL backend. >>>>> >> >> >>>>> >> >> Do you see any fundamental problem with this? >>>>> >> >> >>>>> >> > >>>>> >> >> _______________________________________________ >>>>> >> >> Analytics mailing list >>>>> >> >> [email protected] >>>>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >> >> >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > -- >>>>> >> > Oliver Keyes >>>>> >> > Research Analyst >>>>> >> > Wikimedia Foundation >>>>> >> > >>>>> >> > _______________________________________________ >>>>> >> > Analytics mailing list >>>>> >> > [email protected] >>>>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > _______________________________________________ >>>>> >> > Analytics mailing list >>>>> >> > [email protected] >>>>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >> > >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Oliver Keyes >>>>> >> Research Analyst >>>>> >> Wikimedia Foundation >>>>> >> >>>>> >> _______________________________________________ >>>>> >> Analytics mailing list >>>>> >> [email protected] >>>>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> > >>>>> > >>>>> > >>>>> > _______________________________________________ >>>>> > Analytics mailing list >>>>> > [email protected] >>>>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Oliver Keyes >>>>> Research Analyst >>>>> Wikimedia Foundation >>>>> >>>>> _______________________________________________ >>>>> Analytics mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/analytics >>>>> >>>> >>>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >>> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
