Tim, Thanks so much for the additional links.
Our problem is for the moment much smaller - 4,000,000 mapped way-points, and 80,000 moving vehicles. Clustering the way-points into polygons makes a lot of sense. Fred. On Fri, Jun 19, 2009 at 2:43 PM, tim robertson <[email protected]>wrote: > Hi Fred, > > I was working on 150million point records, and 150,000 fairly detailed > polygons. I had to batch it up and do 40,000 polygons in memory at a > time on the MapReduce jobs. > > If you are dealing with a whole bunch of points, might it be worth > clustering them into polygons first to get candidate points? > We are running this: > http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ and > clustering 1 million points into multipolygons in 5 seconds. This > might get the numbers down to a sensible number. > > It is a problem of great interest to us also, so happy to discuss > ideas... > http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html > was one of my early tests. > > Cheers > > Tim > > > On Fri, Jun 19, 2009 at 9:37 PM, Fred Zappert<[email protected]> wrote: > > Tim, > > > > Thanks. That suggests an implementation that could be very effective at > the > > current scale. > > > > Regards, > > > > Fred. > > > > On Fri, Jun 19, 2009 at 2:27 PM, tim robertson < > [email protected]>wrote: > > > >> I've used it as a source for a bunch of point data, and then tested > >> them in polygons with a contains(). I ended up loading the polygons > >> into memory with an RTree index though using the GeoTools libraries. > >> > >> Cheers > >> > >> Tim > >> > >> > >> On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert<[email protected]> > wrote: > >> > Hi, > >> > > >> > I would like to know if anyone is using HBase for spatial databases. > >> > > >> > The requirements are relatively simple. > >> > > >> > 1. Two dimensions. > >> > 2. Each object represented as a point. > >> > 3. Basic query is nearest neighbor, with a few qualifications such as: > >> > a > >> > > >> > > >
