A colleague of mine just remarked that the indexing problem for geographical retrieval is a solved problem. One algorithm is specified in this book, Machine Learning by Tom Mitchell: http://www.amazon.com/exec/obidos/ASIN/0070428077/qid=1099244886/sr=2-1/ ref=pd_ka_b_2_1/102-6518692-8636163
This algorithm in question is a version of the k-nearest neighbor problem, which my colleague has seen implemented for search in at least one commercial company. It's an expensive book -- it's likely the algorithm could be found via online searches for free. Also, there are probably technical discussions and specs in more specialized geographical information system literature. Chuck > -----Original Message----- > From: Chuck Williams [mailto:[EMAIL PROTECTED] > Sent: Sunday, October 31, 2004 9:41 AM > To: Lucene Developers List > Cc: [EMAIL PROTECTED] > Subject: RE: GIS > > I for one would love to have this functionality, i.e. would use it > immediately if available and efficient. It seems the biggest problem is > how you are going to index the information. If you store and index the > latitude and longitude for a geographically-positioned document, and > then want to find all such documents with a spherical rectangle or > circle, how do you find the candidates? As far as I know, Lucene does > range searches now by expanding a range into a list of all possible > values within that range. This is clearly not a reasonable approach for > latitudes and longitudes, assuming you need precision on the values > (which I do). There are potentially reasonable indexing approaches that > occur to me (e.g. in addition to precise lat/lon store with each object > its grid label in a few different discrete lat/lon grids, or use a > b-tree index of some kind), but this is probably a solved problem > somewhere in the field of geographical information systems. > > After the indexing, the next interesting question would seem to be the > scoring, although this seems a much simpler issue. E.g., a score > related to the distance from the center of the query region would seem > to be appropriate. There should be a mechanism analogous to the current > coord so that this could be tuned or turned off, depending on the needs > of particular queries within the application. > > My $0.02, > > Chuck > > > -----Original Message----- > > From: Guillermo Payet [mailto:[EMAIL PROTECTED] > > Sent: Sunday, October 31, 2004 9:34 AM > > To: [EMAIL PROTECTED] > > Cc: [EMAIL PROTECTED] > > Subject: GIS > > > > Hello, > > > > I'm new here, so first of all I'd like to say hello to everyone. > > > > So, hi there... > > > > I just spent two days trying to get Lucene to handle "geographically > > constricted" searches for our website. (Check out > www.localharvest.org) > > > > I got close, but no cigar. (it works, but is very slow) > > > > We need to be able to do searches only within a geographicaly > limited > > set of documents. (In this case, our member listings) > > > > So... I'd like to volunteer to add the needed functions in Lucene > > to: > > > > - build a LatLonField class for geographical coordinates > > - build a LatLonRectTerm (or whatever) to define matches > > within a latitude/longituded defined rectangle. > > - build a LatLonRadiusTerm (or whatever) to define all matches > > within X distance from a point (lat,lon). > > > > We're now doing all of this through MySQL, which works "ok", but > leaves > > a lot to be desired for the relevance of search results for a lot of > > searches. I've already written all the spherical trig functions to > > to do these searches accurately, and I'd love to port them into > > Lucene. > > > > So my questions are: > > > > - Has there been any talk about doing this before? > > - Is this a bad idea for any reason? > > - What would be the right approach to do this? > > > > The fact that Lucene stores and indexes (or seems it seems) all > terms > > as Strings and that there is no NumericTerm makes me think that I > > might be missing something and that this migh be a much bigger deal > > than I think? > > > > --G > > > > > > > > > > -- > > Guillermo Payet > > L O C A L H A R V E S T > > http://www.localharvest.org > > > > Every Morning I awake torn between a desire to save the world and > > an inclination to savor it. This makes it hard to plan the day. > > > > -E.B.White > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]