On Wed, Jan 7, 2015 at 6:25 PM, Oliver Keyes <[email protected]> wrote:

> We get 120,000 requests a second. We're not storing them all for six
> months. But we do have sampled logs going back that far.


That would be great! Are those in Hadoop?

On Wed, Jan 7, 2015 at 11:36 PM, Oliver Keyes <[email protected]> wrote:

> Not particularly, I don't think - except to remember that namespace
> names are localised, so you're going to have a whale of a time
> matching them (unless you just look for file endings, I guess).
>

In the case of NavigationTiming the nsid is recorded, so that wasn't a
problem; but it has only been added around May, so for the period before
that there is no namespace information at all.

Localized file namespace doesn't sound so bad - I can look up all
translations in Translatewiki, and construct a regexp or a similar
condition. There could be fun exceptions like namespace translations which
have changed recently, but I would be fine with assuming the error caused
by that is not significant.
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to