On Wed, Jan 7, 2015 at 6:25 PM, Oliver Keyes <[email protected]> wrote:
> We get 120,000 requests a second. We're not storing them all for six > months. But we do have sampled logs going back that far. That would be great! Are those in Hadoop? On Wed, Jan 7, 2015 at 11:36 PM, Oliver Keyes <[email protected]> wrote: > Not particularly, I don't think - except to remember that namespace > names are localised, so you're going to have a whale of a time > matching them (unless you just look for file endings, I guess). > In the case of NavigationTiming the nsid is recorded, so that wasn't a problem; but it has only been added around May, so for the period before that there is no namespace information at all. Localized file namespace doesn't sound so bad - I can look up all translations in Translatewiki, and construct a regexp or a similar condition. There could be fun exceptions like namespace translations which have changed recently, but I would be fine with assuming the error caused by that is not significant.
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
