Heh, actually I'm using Perl but I've always associated text-search
with Lucene, I'm not sure if it's the best solution or not. On the
small side there are 1.6 million keywords, on the large side there are
well over 100 million but I might find another way to break down the
searches into smaller searches(send A-G server1, H-R to server2...etc).
Is there another search tool that might be better suited for
this...the only thing I can relate this too is how AdWords works. A
user enters a query in the Google search box and they search their
database for people who've purchased those keywords to the appropriate
ads. What I'm doing is similar but without the payday. :-{
Currently I'm using a (huge) hash table and regular expressions
($query =~ /$keyword/) going down the list from largest to smallest
but I know this is not a long term solution especially if I have to
load the large 100 million+ list in.
Thanks.
On Jul 23, 2008, at 3:54 PM, Steven A Rowe wrote:
Hi Ryan,
I'm not sure Lucene's the right tool for this job.
I have used regular expressions and ternary search trees in the past
to do similar things.
Is the set of keywords too large for an in-memory solution like
these? If not, consider using a tool like the Perl package
Regex::PreSuf <http://search.cpan.org/dist/Regex-PreSuf/> - it can
convert a list of strings into a compact set of alternations, which
you can then import into a Java program. (I'm not aware of any
similar Java tools.)
Steve
On 07/23/2008 at 3:30 PM, Ryan Detzel wrote:
Everything i've read and seen about luceen is search for keywords in
documents; I want to do the reverse. I have a huge list of
keywords("big boy","red ball","computer") and I have phrases that I
want to see if they keywords are in. For example using the small
keyword list above(store in documents in lucene) what's the best
approach to pass in a query "the girl likes red balls" and have it
match the keyword "red ball"?
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]