Hi Fred, I was working on 150million point records, and 150,000 fairly detailed polygons. I had to batch it up and do 40,000 polygons in memory at a time on the MapReduce jobs.
If you are dealing with a whole bunch of points, might it be worth clustering them into polygons first to get candidate points? We are running this: http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ and clustering 1 million points into multipolygons in 5 seconds. This might get the numbers down to a sensible number. It is a problem of great interest to us also, so happy to discuss ideas... http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html was one of my early tests. Cheers Tim On Fri, Jun 19, 2009 at 9:37 PM, Fred Zappert<[email protected]> wrote: > Tim, > > Thanks. That suggests an implementation that could be very effective at the > current scale. > > Regards, > > Fred. > > On Fri, Jun 19, 2009 at 2:27 PM, tim robertson > <[email protected]>wrote: > >> I've used it as a source for a bunch of point data, and then tested >> them in polygons with a contains(). I ended up loading the polygons >> into memory with an RTree index though using the GeoTools libraries. >> >> Cheers >> >> Tim >> >> >> On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert<[email protected]> wrote: >> > Hi, >> > >> > I would like to know if anyone is using HBase for spatial databases. >> > >> > The requirements are relatively simple. >> > >> > 1. Two dimensions. >> > 2. Each object represented as a point. >> > 3. Basic query is nearest neighbor, with a few qualifications such as: >> > a >> > >> >
