[ 
https://issues.apache.org/jira/browse/PIG-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000968#comment-14000968
 ] 

Philip (flip) Kromer commented on PIG-3877:
-------------------------------------------

* This makes separate HTTP calls for the latitude, then the longitude. Better 
to have one method that returns a tuple prepared from the fully-parsed reponse 
and let the caller project what they want.
* What happens on a response that fails to geocode or for any other reason 
doesn't have a latLng element? the  JSONObject latLng = (JSONObject) 
((JSONObject)locations.get(0)).get("latLng"); geolongitude = (String) 
latLng.get("lng"); sequence feels like a recipe for NPE.
* Is the intuit backend ready for people who might use this in production? Or 
even for apache and the world's automated build systems to hit it without 
standing as abusive?
* I worry about having Pig make a network call on every record. There's no 
facility for throttling, backoff, or HTTP keep-alive.
* Even with those, the only way I can imagine to make this workable at 
production scale using an over-the-network geocoder would be to deploy an 
instance on each machine. Pete Warden's [Data Science 
Toolkit|http://petewarden.com/2013/10/06/geocode-the-world-with-the-new-data-science-toolkit/]
 has a [Standalone 
Geocoder|http://www.datasciencetoolkit.org/developerdocs#googlestylegeocoder]; 
this should target that and refer to it (or acceptable alternative) in the docs.

> Getting Geo Latitude/Longitude from Address Lines
> -------------------------------------------------
>
>                 Key: PIG-3877
>                 URL: https://issues.apache.org/jira/browse/PIG-3877
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.10.1
>            Reporter: Rekha Joshi
>            Assignee: Rekha Joshi
>              Labels: patch, piggybank
>             Fix For: 0.10.1
>
>         Attachments: PIG-3877.1.patch
>
>
> In many datasets mining use cases, it is needed to get latitude, longitude 
> just from address lines.The IP fields are missing.
> The Attached udfs for getting the geo latitude/longitude on address lines.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to