Hi Neal,
Thanks for the thoughts.
I was planning on doing a boolean search if needed (*myword OR myword*)
, but that will still not find word fragments in the middle of words
(for search words that are neither suffixes nor prefixes). It does not
look like Lucene (or many full text search engines in general) meet that
requirement. I suppose it is a trade off of features vs. performance.
I am assuming it is generally too expensive of an operation to perform
for full text engines (that generally index very large amounts of text
data) to include as a useful feature.
Granroth, Neal V. wrote:
Douglas,
Acceptable performance is a subjective thing.
I am currently running tests with an index of 140005 "documents", and 507027
terms.
A three field, boolean search, using a single term finds 12063 hits in 0.047
seconds.
A three field, boolean search, using a single wildcard term (*word) finds 923
hits in 0.375 seconds.
That's slower by nearly a factor of 10. Significant yes, but still much faster
than my test UI can display them, and fast enough that supporting wildcard
queries is useful thing to do.
Looking at the source (version 1.9.1) for "WildcardQuery" and the class it uses to
process the query "WildcardTermEnum"; it does not appear to support multiple asterisk
wildcards.
However, you could probably compose a boolean query joining two WildcardQueries
to achieve the that result.
-- Neal
-----Original Message-----
From: Douglas Smith (DataSmithy) [mailto:[EMAIL PROTECTED]
Sent: Friday, August 31, 2007 9:43 AM
To: [email protected]
Subject: Re: using mutliple wildcards in a term?
Hi Michael,
FYI, with version 2.1, I am using wildcards with the standard query
parser, and it seems to be working the way I expect. That is, if I put
wildcards at the beginning *or* end or a word (prefix or suffix word
part), I get different result counts compared to a word without any
wildcards.
However, I was not able to get wildcards to work with the WildcardQuery
function searching on a single term (it returned no results). It is
possible I may have not been using it correctly, since it was my first try.
Also, my index is apparently small enough that I don't get a significant
performance hit from using wildcards at the beginning of a term.
/*Does anybody know if Lucene supports wildcards at the beginning *and*
end of a term at the same time? I am getting no results when I do this. */
Also from an interface design point of view, if Lucene does not support
this, could it be argued that it should throw an error in this case,
instead of returning no results?
Michael Mitiaguin wrote:
Douglas,
I never used it , but in "Lucene in Action" book we may read :
Wildcards at the beginning of a term are prohibited using QueryParser, but
an API-coded WildcardQuery may use leading wildcards (at the expense of
performance).
Regards
Michael
On 8/31/07, Douglas Smith <[EMAIL PROTECTED]> wrote:
Hi everyone,
Are wildcard queries intended to be able to support wildcards at the
beginning *and* end of a term?
I am getting search results when I use a single wildcard (*), but not
when I use them at the begging *and* end of a word. The Lucene java
documentation seems unclear on this point, but one of my requirements is
to find word fragments in the middle of words.
=====================================
Douglas M. Smith
=====================================
Email: [EMAIL PROTECTED]
Yahoo: [EMAIL PROTECTED]
Jabber: [EMAIL PROTECTED]
=====================================
"For years there has been a theory that millions of monkeys typing at
random on millions of typewriters would reproduce the entire works of
Shakespeare. The Internet has proven this theory to be untrue." -
Unknown
--
======================================
Douglas M. Smith
|--- DataSmithy ---|
email: [EMAIL PROTECTED]
work: 540-322-2204
home: 540-381-8939
fax: 866-330-9401
aim: datasmithy
yahoo: datasmithy
skype: datasmitty
jabber: [EMAIL PROTECTED]
======================================
--
======================================
Douglas M. Smith
|--- DataSmithy ---|
email: [EMAIL PROTECTED]
work: 540-322-2204
home: 540-381-8939
fax: 866-330-9401
aim: datasmithy
yahoo: datasmithy
skype: datasmitty
jabber: [EMAIL PROTECTED]
======================================