Heh, actually I'm using Perl but I've always associated text-search with Lucene, I'm not sure if it's the best solution or not. On the small side there are 1.6 million keywords, on the large side there are well over 100 million but I might find another way to break down the searches into smaller searches(send A-G server1, H-R to server2...etc).

Is there another search tool that might be better suited for this...the only thing I can relate this too is how AdWords works. A user enters a query in the Google search box and they search their database for people who've purchased those keywords to the appropriate ads. What I'm doing is similar but without the payday. :-{

Currently I'm using a (huge) hash table and regular expressions ($query =~ /$keyword/) going down the list from largest to smallest but I know this is not a long term solution especially if I have to load the large 100 million+ list in.

Thanks.


On Jul 23, 2008, at 3:54 PM, Steven A Rowe wrote:

Hi Ryan,

I'm not sure Lucene's the right tool for this job.

I have used regular expressions and ternary search trees in the past to do similar things.

Is the set of keywords too large for an in-memory solution like these? If not, consider using a tool like the Perl package Regex::PreSuf <http://search.cpan.org/dist/Regex-PreSuf/> - it can convert a list of strings into a compact set of alternations, which you can then import into a Java program. (I'm not aware of any similar Java tools.)

Steve

On 07/23/2008 at 3:30 PM, Ryan Detzel wrote:
Everything i've read and seen about luceen is search for keywords in
documents; I want to do the reverse. I have a huge list of
keywords("big boy","red ball","computer") and I have phrases that I
want to see if they keywords are in. For example using the small
keyword list above(store in documents in lucene) what's the best
approach to pass in a query "the girl likes red balls" and have it
match the keyword "red ball"?

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to