Hi Fred,

I was working on 150million point records, and 150,000 fairly detailed
polygons.  I had to batch it up and do 40,000 polygons in memory at a
time on the MapReduce jobs.

If you are dealing with a whole bunch of points, might it be worth
clustering them into polygons first to get candidate points?
We are running this:
http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ and
clustering 1 million points into multipolygons in 5 seconds.  This
might get the numbers down to a sensible number.

It is a problem of great interest to us also, so happy to discuss
ideas... 
http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html
was one of my early tests.

Cheers

Tim


On Fri, Jun 19, 2009 at 9:37 PM, Fred Zappert<[email protected]> wrote:
> Tim,
>
> Thanks. That suggests an implementation that could be very effective at the
> current scale.
>
> Regards,
>
> Fred.
>
> On Fri, Jun 19, 2009 at 2:27 PM, tim robertson 
> <[email protected]>wrote:
>
>> I've used it as a source for a bunch of point data, and then tested
>> them in polygons with a contains().  I ended up loading the polygons
>> into memory with an RTree index though using the GeoTools libraries.
>>
>> Cheers
>>
>> Tim
>>
>>
>> On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert<[email protected]> wrote:
>> > Hi,
>> >
>> > I would like to know if anyone is using HBase for spatial databases.
>> >
>> > The requirements are relatively simple.
>> >
>> > 1. Two dimensions.
>> > 2. Each object represented as a point.
>> > 3. Basic query is nearest neighbor, with a few qualifications such as:
>> > a
>> >
>>
>

Reply via email to