Oops sorry, I forgot to answer this question :)
A new map field named "geocoded_data" will contain, when available:

   - continent
   - country
   - country_code
   - subdivision
   - postal_code
   - city
   - timezone
   - latitude
   - longitude

For instance:
{"city":"Mukilteo","country_code":"US","longitude":"-122.3042","subdivision":"Washington","timezone":"America/Los_Angeles","postal_code":"98275","continent":"North
America","latitude":"47.913","country":"United States"}

Cheers
Joseph

On Mon, Feb 23, 2015 at 8:24 PM, Oliver Keyes <[email protected]> wrote:

> Gotcha. So, for transparency...what are we calculating? Country? City? :D
>
> On 23 February 2015 at 13:59, Joseph Allemandou
> <[email protected]> wrote:
> > As per the IRC discussion, we won't recompute historical data, but start
> > computing new values from the deploy time onward.
> > A new "version" field, and associated documentation will also be
> provided,
> > allowing to follow changes along time.
> > Thanks for your inputs !
> > Best
> >
> >
> > On Mon, Feb 23, 2015 at 4:58 PM, Oliver Keyes <[email protected]>
> wrote:
> >>
> >> I think it should be fine-ish; it depends what we're calculating. When
> >> you say "geocoded information", what do you mean? Country? City? I
> >> wouldn't expect country to move about a lot in 60 days (which is the
> >> range of our data): I would expect city to.
> >>
> >> What's the status on getting an oozie job or similar to compute going
> >> forward? To me that's more of a priority than historical data.
> >>
> >> On 23 February 2015 at 10:53, Joseph Allemandou
> >> <[email protected]> wrote:
> >> > Hi,
> >> >
> >> > As part of my first assignment, I'll recompute our historical
> webrequest
> >> > dataset, adding client_ip and geocoded information.
> >> >
> >> > While it seems correct to compute historical client_ip based on the
> >> > existing
> >> > ip and the x_forwarded_for, the use of the current state of the
> geocoded
> >> > maxmind library to compute historical data is more error-prone.
> >> >
> >> > I can either compute it anyway, knowing that there'll be some errors,
> or
> >> > put
> >> > null values for data older than a given point in time.
> >> >
> >> > I'll launch the script to recompute the data as soon as max(a
> consensus
> >> > is
> >> > find on this matter, operations gives me the right to run the script)
> :)
> >> >
> >> > Thanks
> >> > --
> >> > Joseph Allemandou
> >> > Data Engineer @ Wikimedia Foundation
> >> > IRC: joal
> >> >
> >> > _______________________________________________
> >> > Analytics mailing list
> >> > [email protected]
> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >> >
> >>
> >>
> >>
> >> --
> >> Oliver Keyes
> >> Research Analyst
> >> Wikimedia Foundation
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> [email protected]
> >> https://lists.wikimedia.org/mailman/listinfo/analytics
> >
> >
> >
> >
> > --
> > Joseph Allemandou
> > Data Engineer @ Wikimedia Foundation
> > IRC: joal
> >
> > _______________________________________________
> > Analytics mailing list
> > [email protected]
> > https://lists.wikimedia.org/mailman/listinfo/analytics
> >
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



-- 
*Joseph Allemandou*
Data Engineer @ Wikimedia Foundation
IRC: joal
_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to