Re: Spatial Databases on HBase (or Hadoop)

tim robertson Sat, 20 Jun 2009 04:46:13 -0700

Hi Fred,

So I am guessing then your "real time" calculations are all going to
be focused about the moving vehicles right?
If the way-points are relatively static you can preprocess information
about those offline (distance between each, data mining average time
taken to travel between 2  etc).


So I am guessing you would need to find way-points relative to a given
vehicle - if this is the case, I think you are going to need to
investigate some kind of index for the way-points.  We do this for our
150 million points by putting them in an identified 1 degree x 1
degree cell (and then 0.1 x 0.1 degree cells), so that if someone is
interested in points near a location, we first determine which cells
are candidates and immediately we have reduced the candidate points to
check.

In database terms, we have latitude, longitude and then create a
(cell_id int, centi_cell_id int).

If you know the routes that a vehicle is taking, is there any way you
could preplan it's route perhaps and cache that, or store somehow
known routes between way-points?  This might allow you to really
reduce the candidates to check.

Just some ideas

Tim
skype: timrobertson100





On Fri, Jun 19, 2009 at 10:16 PM, Fred Zappert<[email protected]> wrote:
> Tim,
>
> Thanks so much for the additional links.
>
> Our problem is for the moment much smaller - 4,000,000 mapped way-points,
> and 80,000 moving vehicles.
>
> Clustering the way-points into polygons makes a lot of sense.
>
> Fred.
>
> On Fri, Jun 19, 2009 at 2:43 PM, tim robertson 
> <[email protected]>wrote:
>
>> Hi Fred,
>>
>> I was working on 150million point records, and 150,000 fairly detailed
>> polygons.  I had to batch it up and do 40,000 polygons in memory at a
>> time on the MapReduce jobs.
>>
>> If you are dealing with a whole bunch of points, might it be worth
>> clustering them into polygons first to get candidate points?
>> We are running this:
>> http://code.flickr.com/blog/2008/10/30/the-shape-of-alpha/ and
>> clustering 1 million points into multipolygons in 5 seconds.  This
>> might get the numbers down to a sensible number.
>>
>> It is a problem of great interest to us also, so happy to discuss
>> ideas...
>> http://biodivertido.blogspot.com/2008/11/reproducing-spatial-joins-using-hadoop.html
>> was one of my early tests.
>>
>> Cheers
>>
>> Tim
>>
>>
>> On Fri, Jun 19, 2009 at 9:37 PM, Fred Zappert<[email protected]> wrote:
>> > Tim,
>> >
>> > Thanks. That suggests an implementation that could be very effective at
>> the
>> > current scale.
>> >
>> > Regards,
>> >
>> > Fred.
>> >
>> > On Fri, Jun 19, 2009 at 2:27 PM, tim robertson <
>> [email protected]>wrote:
>> >
>> >> I've used it as a source for a bunch of point data, and then tested
>> >> them in polygons with a contains().  I ended up loading the polygons
>> >> into memory with an RTree index though using the GeoTools libraries.
>> >>
>> >> Cheers
>> >>
>> >> Tim
>> >>
>> >>
>> >> On Fri, Jun 19, 2009 at 9:22 PM, Fred Zappert<[email protected]>
>> wrote:
>> >> > Hi,
>> >> >
>> >> > I would like to know if anyone is using HBase for spatial databases.
>> >> >
>> >> > The requirements are relatively simple.
>> >> >
>> >> > 1. Two dimensions.
>> >> > 2. Each object represented as a point.
>> >> > 3. Basic query is nearest neighbor, with a few qualifications such as:
>> >> > a
>> >> >
>> >>
>> >
>>
>

Re: Spatial Databases on HBase (or Hadoop)

Reply via email to