Is this a potential solution to Oliver's concern: For "real" image views, add an X-Analytics header value of "real-view=true" to the request itself?
If that's not feasible, we should look into using statsv for this (not sure how that works) or having this be a different kafka topic and not consumed into HDFS. On Thu, Feb 5, 2015 at 11:59 AM, Toby Negrin <[email protected]> wrote: > I created a card -- modify as desired: > > https://trello.com/c/HMgVD4mz > > -Toby > > On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <[email protected]> wrote: > >> It turns out that the media viewer (on desktop; don't know about mobile) >> does a lot of caching so just because an image is loaded from swift, it >> doesn't mean it is viewed. We'd like to provide more accurate stats to the >> GLAM folks, so yes, I think this needs to be added eventually. Let's leave >> it out of scope for now. >> >> -Toby >> >> On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <[email protected]> >> wrote: >> >>> We want to include these files in the pageview definition? :/. >>> >>> My point was more that we should try to avoid traffic-generating >>> requests that exist solely as a hack for analytics purposes; it's >>> artificial work for both users and us. If this is the only way of >>> doing things that's totally fine. >>> >>> On 5 February 2015 at 11:38, Toby Negrin <[email protected]> wrote: >>> > Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based >>> > solution would be basically doing the same thing as you propose. >>> > >>> > Can you please run it past ops (especially the 404 v 204) part? >>> > >>> > Oliver -- the issue is that we'd like to figure out a way to provide >>> > accurate views of the media files; because of client side caching, we >>> can't >>> > use the current requests. But your point is a good one -- we'll need >>> to add >>> > this to the PV definition. >>> > >>> > -Toby >>> > >>> > On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <[email protected]> >>> wrote: >>> >> >>> >> A nice theory, but if they appear in the webrequest table (presumably >>> >> they would, and we're not creating an entirely new set of varnishes >>> >> for the transmission of dummy images?) they have to be factored in. >>> >> Again, however, the new definition automatically filters them by >>> >> checking the webrequest source and MIME type, so this is not a >>> >> problem, as I originally stated. >>> >> >>> >> On 5 February 2015 at 08:10, Erik Zachte <[email protected]> >>> wrote: >>> >> > Oliver, this is not about pageviews, but about media file views. >>> >> > >>> >> > >>> >> > >>> >> > These will be collected and dumped separately, as per >>> >> > >>> >> > >>> https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts >>> >> > . >>> >> > >>> >> > >>> >> > >>> >> > Erik >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > From: [email protected] >>> >> > [mailto:[email protected]] On Behalf Of Nuria >>> Ruiz >>> >> > Sent: Wednesday, February 04, 2015 22:28 >>> >> > To: A mailing list for the Analytics Team at WMF and everybody who >>> has >>> >> > an >>> >> > interest in Wikipedia and analytics. >>> >> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer >>> views >>> >> > >>> >> > >>> >> > >>> >> >>We would add a rule to Vagrant to make sure it does not try to look >>> up >>> >> >> such >>> >> >> requests in Swift but returns a 404 immediately. >>> >> > >>> >> > I bet ops would like it a lot better if this is a 204 and it kind of >>> >> > makes >>> >> > sense as it is the code used for beacons and such. Otherwise they >>> might >>> >> > get >>> >> > alarms on 404s increasing. >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <[email protected] >>> > >>> >> > wrote: >>> >> > >>> >> > Not really; the new pageviews definition wouldn't include those >>> files >>> >> > anyway. It seems silly, thought, be deliberately generating a large >>> >> > amount of automated noise and client requests for this :/. >>> >> > >>> >> > >>> >> > On 4 February 2015 at 15:00, Gergo Tisza <[email protected]> >>> wrote: >>> >> >> Hi all, >>> >> >> >>> >> >> Erik Zachte is working on file view stats and is looking for a way >>> to >>> >> >> track >>> >> >> Media Viewer image views (for which there is no 1:1 relation >>> between >>> >> >> server >>> >> >> hits and actual image views); after some back and forth in >>> >> >> https://phabricator.wikimedia.org/T86914 I proposed the following >>> hack: >>> >> >> >>> >> >> whenever the javascript code in MediaViewer determines that an >>> image >>> >> >> view >>> >> >> happened (e.g. an image has been displayed for a certain amount of >>> >> >> time), >>> >> >> it >>> >> >> makes a request to a certain fake image, say >>> >> >> >>> >> >> >>> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview- >>> <real >>> >> >> image name>/<size>px-thumbnail.<ext> . These hits can than be >>> easily >>> >> >> filtered from the varnish request logs and added to the normal >>> >> >> requests. >>> >> >> We >>> >> >> would add a rule to Vagrant to make sure it does not try to look up >>> >> >> such >>> >> >> requests in Swift but returns a 404 immediately. >>> >> >> >>> >> >> This would be a temporary workaround until there is a proper way >>> to log >>> >> >> virtual image views, such as EventLogging with a non-SQL backend. >>> >> >> >>> >> >> Do you see any fundamental problem with this? >>> >> >> >>> >> > >>> >> >> _______________________________________________ >>> >> >> Analytics mailing list >>> >> >> [email protected] >>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > Oliver Keyes >>> >> > Research Analyst >>> >> > Wikimedia Foundation >>> >> > >>> >> > _______________________________________________ >>> >> > Analytics mailing list >>> >> > [email protected] >>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > Analytics mailing list >>> >> > [email protected] >>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Oliver Keyes >>> >> Research Analyst >>> >> Wikimedia Foundation >>> >> >>> >> _______________________________________________ >>> >> Analytics mailing list >>> >> [email protected] >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>> > >>> > >>> > >>> > _______________________________________________ >>> > Analytics mailing list >>> > [email protected] >>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> > >>> >>> >>> >>> -- >>> Oliver Keyes >>> Research Analyst >>> Wikimedia Foundation >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
