Bandwidth, I imagine? 25M events is a lot of events on top of the existing throughput.
On 5 February 2015 at 18:13, Ryan Kaldari <[email protected]> wrote: > I have to admit that I haven't read all of this rather lengthy thread, but > why wouldn't we just track this with EventLogging? That would avoid all the > pitfalls of other possible solutions: dealing with caching, creating bogus > extra file requests, etc. > > On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <[email protected]> wrote: >> >> It turns out that the media viewer (on desktop; don't know about mobile) >> does a lot of caching so just because an image is loaded from swift, it >> doesn't mean it is viewed. We'd like to provide more accurate stats to the >> GLAM folks, so yes, I think this needs to be added eventually. Let's leave >> it out of scope for now. >> >> -Toby >> >> On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <[email protected]> wrote: >>> >>> We want to include these files in the pageview definition? :/. >>> >>> My point was more that we should try to avoid traffic-generating >>> requests that exist solely as a hack for analytics purposes; it's >>> artificial work for both users and us. If this is the only way of >>> doing things that's totally fine. >>> >>> On 5 February 2015 at 11:38, Toby Negrin <[email protected]> wrote: >>> > Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop based >>> > solution would be basically doing the same thing as you propose. >>> > >>> > Can you please run it past ops (especially the 404 v 204) part? >>> > >>> > Oliver -- the issue is that we'd like to figure out a way to provide >>> > accurate views of the media files; because of client side caching, we >>> > can't >>> > use the current requests. But your point is a good one -- we'll need to >>> > add >>> > this to the PV definition. >>> > >>> > -Toby >>> > >>> > On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <[email protected]> >>> > wrote: >>> >> >>> >> A nice theory, but if they appear in the webrequest table (presumably >>> >> they would, and we're not creating an entirely new set of varnishes >>> >> for the transmission of dummy images?) they have to be factored in. >>> >> Again, however, the new definition automatically filters them by >>> >> checking the webrequest source and MIME type, so this is not a >>> >> problem, as I originally stated. >>> >> >>> >> On 5 February 2015 at 08:10, Erik Zachte <[email protected]> >>> >> wrote: >>> >> > Oliver, this is not about pageviews, but about media file views. >>> >> > >>> >> > >>> >> > >>> >> > These will be collected and dumped separately, as per >>> >> > >>> >> > >>> >> > https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts >>> >> > . >>> >> > >>> >> > >>> >> > >>> >> > Erik >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > From: [email protected] >>> >> > [mailto:[email protected]] On Behalf Of Nuria >>> >> > Ruiz >>> >> > Sent: Wednesday, February 04, 2015 22:28 >>> >> > To: A mailing list for the Analytics Team at WMF and everybody who >>> >> > has >>> >> > an >>> >> > interest in Wikipedia and analytics. >>> >> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer >>> >> > views >>> >> > >>> >> > >>> >> > >>> >> >>We would add a rule to Vagrant to make sure it does not try to look >>> >> >> up >>> >> >> such >>> >> >> requests in Swift but returns a 404 immediately. >>> >> > >>> >> > I bet ops would like it a lot better if this is a 204 and it kind of >>> >> > makes >>> >> > sense as it is the code used for beacons and such. Otherwise they >>> >> > might >>> >> > get >>> >> > alarms on 404s increasing. >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes <[email protected]> >>> >> > wrote: >>> >> > >>> >> > Not really; the new pageviews definition wouldn't include those >>> >> > files >>> >> > anyway. It seems silly, thought, be deliberately generating a large >>> >> > amount of automated noise and client requests for this :/. >>> >> > >>> >> > >>> >> > On 4 February 2015 at 15:00, Gergo Tisza <[email protected]> >>> >> > wrote: >>> >> >> Hi all, >>> >> >> >>> >> >> Erik Zachte is working on file view stats and is looking for a way >>> >> >> to >>> >> >> track >>> >> >> Media Viewer image views (for which there is no 1:1 relation >>> >> >> between >>> >> >> server >>> >> >> hits and actual image views); after some back and forth in >>> >> >> https://phabricator.wikimedia.org/T86914 I proposed the following >>> >> >> hack: >>> >> >> >>> >> >> whenever the javascript code in MediaViewer determines that an >>> >> >> image >>> >> >> view >>> >> >> happened (e.g. an image has been displayed for a certain amount of >>> >> >> time), >>> >> >> it >>> >> >> makes a request to a certain fake image, say >>> >> >> >>> >> >> >>> >> >> upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real >>> >> >> image name>/<size>px-thumbnail.<ext> . These hits can than be >>> >> >> easily >>> >> >> filtered from the varnish request logs and added to the normal >>> >> >> requests. >>> >> >> We >>> >> >> would add a rule to Vagrant to make sure it does not try to look up >>> >> >> such >>> >> >> requests in Swift but returns a 404 immediately. >>> >> >> >>> >> >> This would be a temporary workaround until there is a proper way to >>> >> >> log >>> >> >> virtual image views, such as EventLogging with a non-SQL backend. >>> >> >> >>> >> >> Do you see any fundamental problem with this? >>> >> >> >>> >> > >>> >> >> _______________________________________________ >>> >> >> Analytics mailing list >>> >> >> [email protected] >>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > Oliver Keyes >>> >> > Research Analyst >>> >> > Wikimedia Foundation >>> >> > >>> >> > _______________________________________________ >>> >> > Analytics mailing list >>> >> > [email protected] >>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > Analytics mailing list >>> >> > [email protected] >>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Oliver Keyes >>> >> Research Analyst >>> >> Wikimedia Foundation >>> >> >>> >> _______________________________________________ >>> >> Analytics mailing list >>> >> [email protected] >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics >>> > >>> > >>> > >>> > _______________________________________________ >>> > Analytics mailing list >>> > [email protected] >>> > https://lists.wikimedia.org/mailman/listinfo/analytics >>> > >>> >>> >>> >>> -- >>> Oliver Keyes >>> Research Analyst >>> Wikimedia Foundation >>> >>> _______________________________________________ >>> Analytics mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/analytics >> >> >> >> _______________________________________________ >> Analytics mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
