Hello, I am trying to figure out the best search strategy for my situation and am looking for advice. I will be processing short bits of text (Tweets for example), and need to search them to see if they certain terms. The list of terms is a set of locations (towns cities) and is quite long, approximately 500 different entries, and terms can contain spaces.
My typical approach would be to index each Tweet and then search the resulting document index for each search term. However, I'm not sure this is the best solution in this situation for two reasons: first, the list of locations is quite long so we are talking about a large number of queries, which may grow even larger so I see scalability issues. Second, my Tweet index is not stable as I am just interested in each Tweet as it comes in, and can discard it after, so I have no need really to index each entry. It is actually my list of locations which is stable and searchable. My thought is to do some kind of reverse search, in which I index the locations, and then I pass each Tweet to that index as my query. I am not exactly sure how to go about this though in a way that will do the search in the way I want. I am also concerned about locations that contain spaces and how to have these recognised. As an example, if my locations list is as follows: {"New York", "Chicago", "Los Angeles"} and my text is the following: "Fire burning in Los Angeles", I would like to be able to send that _text_ as query to my indexed location list, and get a hit. Is this something that is doable, or does someone envision a different approach to the problem? Thanks for your time. Mark Ferguson