Reverse Search

Mark Ferguson Mon, 01 Mar 2010 12:36:14 -0800

Hello,

I am trying to figure out the best search strategy for my situation and am
looking for advice. I will be processing short bits of text (Tweets for
example), and need to search them to see if they certain terms. The list of
terms is a set of locations (towns cities) and is quite long, approximately
500 different entries, and terms can contain spaces.


My typical approach would be to index each Tweet and then search the
resulting document index for each search term. However, I'm not sure this is
the best solution in this situation for two reasons: first, the list of
locations is quite long so we are talking about a large number of queries,
which may grow even larger so I see scalability issues. Second, my Tweet
index is not stable as I am just interested in each Tweet as it comes in,
and can discard it after, so I have no need really to index each entry. It
is actually my list of locations which is stable and searchable.

My thought is to do some kind of reverse search, in which I index the
locations, and then I pass each Tweet to that index as my query. I am not
exactly sure how to go about this though in a way that will do the search in
the way I want. I am also concerned about locations that contain spaces and
how to have these recognised.

As an example, if my locations list is as follows: {"New York", "Chicago",
"Los Angeles"} and my text is the following: "Fire burning in Los Angeles",
I would like to be able to send that _text_ as  query to my indexed location
list, and get a hit.

Is this something that is doable, or does someone envision a different
approach to the problem? Thanks for your time.

Mark Ferguson

Reverse Search

Reply via email to