Makes sense. Yeah, I had a "assuming everyone knows what you know" moment; I appreciate the automated query logging may not be a known thing (for the reasons Jeremy sets out, it's currently accessible only via an internal proxy, which makes it a wee bit difficult for people to know that it exists ;p). Sorry about that.
We could probably do it via Hadoop (it'd be a lot easier to automate!) if we come up with some useful heuristics for what automated activity looks like. I'm hoping that the spider/bot/automation identification as part of the pageviews definition will give us some of that. On 20 October 2014 13:50, Jeremy Baron <[email protected]> wrote: > On Oct 20, 2014 1:36 PM, "Oliver Keyes" <[email protected]> wrote: > > I guess mostly I'm just confused as to what you'd add on top of "SSH > keys, automated logging and transparent documentation". > > I *think* Pine was asking for automatic query logging similar to what > you've just said is already happening. > > Eventually maybe we'll get these types of queries mostly running on > hadoop+M/R. (vs. processing a local file on disk) We could publish public > logs of M/R jobs and for some of them allow public download of the output. > (but this particular query would not allow public downloading of the output > because IP/UA string/etc.) > > -Jeremy > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- Oliver Keyes Research Analyst Wikimedia Foundation
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
