>I have to admit that I haven't read all of this rather lengthy thread, but why wouldn't we just track this with EventLogging? I think a good usage of event logging is tracking "events", not pageviews. We do not need a capsule+ schema+ validation system to be able to count pageviews. Plain requests would work fine, is a lot simpler use case.
On Thu, Feb 5, 2015 at 3:16 PM, Oliver Keyes <[email protected]> wrote: > Bandwidth, I imagine? 25M events is a lot of events on top of the > existing throughput. > > On 5 February 2015 at 18:13, Ryan Kaldari <[email protected]> wrote: > > I have to admit that I haven't read all of this rather lengthy thread, > but > > why wouldn't we just track this with EventLogging? That would avoid all > the > > pitfalls of other possible solutions: dealing with caching, creating > bogus > > extra file requests, etc. > > > > On Thu, Feb 5, 2015 at 8:51 AM, Toby Negrin <[email protected]> > wrote: > >> > >> It turns out that the media viewer (on desktop; don't know about mobile) > >> does a lot of caching so just because an image is loaded from swift, it > >> doesn't mean it is viewed. We'd like to provide more accurate stats to > the > >> GLAM folks, so yes, I think this needs to be added eventually. Let's > leave > >> it out of scope for now. > >> > >> -Toby > >> > >> On Thu, Feb 5, 2015 at 8:46 AM, Oliver Keyes <[email protected]> > wrote: > >>> > >>> We want to include these files in the pageview definition? :/. > >>> > >>> My point was more that we should try to avoid traffic-generating > >>> requests that exist solely as a hack for analytics purposes; it's > >>> artificial work for both users and us. If this is the only way of > >>> doing things that's totally fine. > >>> > >>> On 5 February 2015 at 11:38, Toby Negrin <[email protected]> > wrote: > >>> > Hi Gergo -- I like this idea. As far as capacity, any EL-Hadoop > based > >>> > solution would be basically doing the same thing as you propose. > >>> > > >>> > Can you please run it past ops (especially the 404 v 204) part? > >>> > > >>> > Oliver -- the issue is that we'd like to figure out a way to provide > >>> > accurate views of the media files; because of client side caching, we > >>> > can't > >>> > use the current requests. But your point is a good one -- we'll need > to > >>> > add > >>> > this to the PV definition. > >>> > > >>> > -Toby > >>> > > >>> > On Thu, Feb 5, 2015 at 5:18 AM, Oliver Keyes <[email protected]> > >>> > wrote: > >>> >> > >>> >> A nice theory, but if they appear in the webrequest table > (presumably > >>> >> they would, and we're not creating an entirely new set of varnishes > >>> >> for the transmission of dummy images?) they have to be factored in. > >>> >> Again, however, the new definition automatically filters them by > >>> >> checking the webrequest source and MIME type, so this is not a > >>> >> problem, as I originally stated. > >>> >> > >>> >> On 5 February 2015 at 08:10, Erik Zachte <[email protected]> > >>> >> wrote: > >>> >> > Oliver, this is not about pageviews, but about media file views. > >>> >> > > >>> >> > > >>> >> > > >>> >> > These will be collected and dumped separately, as per > >>> >> > > >>> >> > > >>> >> > > https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts > >>> >> > . > >>> >> > > >>> >> > > >>> >> > > >>> >> > Erik > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > From: [email protected] > >>> >> > [mailto:[email protected]] On Behalf Of Nuria > >>> >> > Ruiz > >>> >> > Sent: Wednesday, February 04, 2015 22:28 > >>> >> > To: A mailing list for the Analytics Team at WMF and everybody who > >>> >> > has > >>> >> > an > >>> >> > interest in Wikipedia and analytics. > >>> >> > Subject: Re: [Analytics] Virtual file view hack for Media Viewer > >>> >> > views > >>> >> > > >>> >> > > >>> >> > > >>> >> >>We would add a rule to Vagrant to make sure it does not try to > look > >>> >> >> up > >>> >> >> such > >>> >> >> requests in Swift but returns a 404 immediately. > >>> >> > > >>> >> > I bet ops would like it a lot better if this is a 204 and it kind > of > >>> >> > makes > >>> >> > sense as it is the code used for beacons and such. Otherwise they > >>> >> > might > >>> >> > get > >>> >> > alarms on 404s increasing. > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > On Wed, Feb 4, 2015 at 12:38 PM, Oliver Keyes < > [email protected]> > >>> >> > wrote: > >>> >> > > >>> >> > Not really; the new pageviews definition wouldn't include those > >>> >> > files > >>> >> > anyway. It seems silly, thought, be deliberately generating a > large > >>> >> > amount of automated noise and client requests for this :/. > >>> >> > > >>> >> > > >>> >> > On 4 February 2015 at 15:00, Gergo Tisza <[email protected]> > >>> >> > wrote: > >>> >> >> Hi all, > >>> >> >> > >>> >> >> Erik Zachte is working on file view stats and is looking for a > way > >>> >> >> to > >>> >> >> track > >>> >> >> Media Viewer image views (for which there is no 1:1 relation > >>> >> >> between > >>> >> >> server > >>> >> >> hits and actual image views); after some back and forth in > >>> >> >> https://phabricator.wikimedia.org/T86914 I proposed the > following > >>> >> >> hack: > >>> >> >> > >>> >> >> whenever the javascript code in MediaViewer determines that an > >>> >> >> image > >>> >> >> view > >>> >> >> happened (e.g. an image has been displayed for a certain amount > of > >>> >> >> time), > >>> >> >> it > >>> >> >> makes a request to a certain fake image, say > >>> >> >> > >>> >> >> > >>> >> >> > upload.wikimedia.org/wikipedia/commons/thumb/0/00/Virtual-imageview-<real > >>> >> >> image name>/<size>px-thumbnail.<ext> . These hits can than be > >>> >> >> easily > >>> >> >> filtered from the varnish request logs and added to the normal > >>> >> >> requests. > >>> >> >> We > >>> >> >> would add a rule to Vagrant to make sure it does not try to look > up > >>> >> >> such > >>> >> >> requests in Swift but returns a 404 immediately. > >>> >> >> > >>> >> >> This would be a temporary workaround until there is a proper way > to > >>> >> >> log > >>> >> >> virtual image views, such as EventLogging with a non-SQL backend. > >>> >> >> > >>> >> >> Do you see any fundamental problem with this? > >>> >> >> > >>> >> > > >>> >> >> _______________________________________________ > >>> >> >> Analytics mailing list > >>> >> >> [email protected] > >>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics > >>> >> >> > >>> >> > > >>> >> > > >>> >> > > >>> >> > -- > >>> >> > Oliver Keyes > >>> >> > Research Analyst > >>> >> > Wikimedia Foundation > >>> >> > > >>> >> > _______________________________________________ > >>> >> > Analytics mailing list > >>> >> > [email protected] > >>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > _______________________________________________ > >>> >> > Analytics mailing list > >>> >> > [email protected] > >>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics > >>> >> > > >>> >> > >>> >> > >>> >> > >>> >> -- > >>> >> Oliver Keyes > >>> >> Research Analyst > >>> >> Wikimedia Foundation > >>> >> > >>> >> _______________________________________________ > >>> >> Analytics mailing list > >>> >> [email protected] > >>> >> https://lists.wikimedia.org/mailman/listinfo/analytics > >>> > > >>> > > >>> > > >>> > _______________________________________________ > >>> > Analytics mailing list > >>> > [email protected] > >>> > https://lists.wikimedia.org/mailman/listinfo/analytics > >>> > > >>> > >>> > >>> > >>> -- > >>> Oliver Keyes > >>> Research Analyst > >>> Wikimedia Foundation > >>> > >>> _______________________________________________ > >>> Analytics mailing list > >>> [email protected] > >>> https://lists.wikimedia.org/mailman/listinfo/analytics > >> > >> > >> > >> _______________________________________________ > >> Analytics mailing list > >> [email protected] > >> https://lists.wikimedia.org/mailman/listinfo/analytics > >> > > > > > > _______________________________________________ > > Analytics mailing list > > [email protected] > > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
