> We're moving forward to generate Hive queries that will represent the formal > specification. Should a specific implementation (e.g. Hive) represent the formal specification? I tend to think it should be tech-agnostic, no?
> On Dec 15, 2014, at 12:15, Aaron Halfaker <[email protected]> wrote: > > Toby, that's right. We're moving forward to generate Hive queries that will > represent the formal specification. > > -Aaron > > On Mon, Dec 15, 2014 at 9:12 AM, Oliver Keyes <[email protected] > <mailto:[email protected]>> wrote: > We've written the draft Hive queries and I'm reviewing them with Otto now. > Currently blocked on Hadoop heapsize issues, but I'm sure we'll work it > through :). > > On 15 December 2014 at 12:10, Toby Negrin <[email protected] > <mailto:[email protected]>> wrote: > Hi Aaron, all -- > > I haven't seen any discussion on this which is a sign that we can forward > with turning over the draft. Thoughts? > > thanks, > > -Toby > > On Tue, Dec 9, 2014 at 5:15 PM, Aaron Halfaker <[email protected] > <mailto:[email protected]>> wrote: > Hey folks, > > As discussions on the new page view definition have been calming down, we're > preparing to deliver a draft version to the Devs. I want to make sure that > we all know the status and that any substantial concerns are raised before we > hand things off on Friday, Dec 12th. > > For this phase, we are delivering the general filter[1]. This is the highest > level filter, and exists primarily to distinguish requests worthy of further > evaluation. Our plan is to take the definition as it exists on the 12th, and > begin generating high-level aggregate numbers based on it. In future > iterations, we will be digging into different breakdowns of this metric, and > iterating on it to handle any inconsistencies or unexpected results. There's > a few differences from Web Stat Collector's (WSC) version of the general > filter that we want to call to your attention to. > We include searches -- WSC explicitly excludes them. > We include Apps traffic -- WSC does not detect Apps traffic > We include variants of /wiki/ (e.g. /zh-tw/, /zh-cn/, /sr-ec/) -- WSC > hardcodes "/wiki/" > We don't include Banner impressions -- WSC includes them. > There are also some known issues with the new definition that are worth your > notice: > > Internal traffic is counted > Note that WSC filters some internal traffic by hardcoding a set of IPs in the > definition. We are working on parsing puppet templates in order to > automatically detect which IPs represent internal traffic. This will be a > /better/ solution, but it's not quite ready yet because parsing puppet is > hard. > Spider traffic is counted > We will be using the User-agent field to detect and flag spider-based > traffic. This "tag definition" will be delivered in a subsequent definition. > This actually matches WSC, which does not filter spider for the high-level > metrics. > These are problems we're aware of, and will be factoring in as we go forward > with our next task: refining the definition using real, hourly-level traffic > data. Thanks to everyone who has given feedback and participated in the > process thus far, particularly Nemo, Erik, and Christian. > > 1. https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters > <https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters> > > -Aaron & Oliver > > _______________________________________________ > Analytics mailing list > [email protected] <mailto:[email protected]> > https://lists.wikimedia.org/mailman/listinfo/analytics > <https://lists.wikimedia.org/mailman/listinfo/analytics> > > > _______________________________________________ > Analytics mailing list > [email protected] <mailto:[email protected]> > https://lists.wikimedia.org/mailman/listinfo/analytics > <https://lists.wikimedia.org/mailman/listinfo/analytics> > > > > -- > Oliver Keyes > Research Analyst > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > [email protected] <mailto:[email protected]> > https://lists.wikimedia.org/mailman/listinfo/analytics > <https://lists.wikimedia.org/mailman/listinfo/analytics> > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
