I think it should be fine-ish; it depends what we're calculating. When you say "geocoded information", what do you mean? Country? City? I wouldn't expect country to move about a lot in 60 days (which is the range of our data): I would expect city to.
What's the status on getting an oozie job or similar to compute going forward? To me that's more of a priority than historical data. On 23 February 2015 at 10:53, Joseph Allemandou <[email protected]> wrote: > Hi, > > As part of my first assignment, I'll recompute our historical webrequest > dataset, adding client_ip and geocoded information. > > While it seems correct to compute historical client_ip based on the existing > ip and the x_forwarded_for, the use of the current state of the geocoded > maxmind library to compute historical data is more error-prone. > > I can either compute it anyway, knowing that there'll be some errors, or put > null values for data older than a given point in time. > > I'll launch the script to recompute the data as soon as max(a consensus is > find on this matter, operations gives me the right to run the script) :) > > Thanks > -- > Joseph Allemandou > Data Engineer @ Wikimedia Foundation > IRC: joal > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > -- Oliver Keyes Research Analyst Wikimedia Foundation _______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
