I posted it in this list, because I thought it was more a development
'issue', but thanks for the quick answer. I'll check out the ngrams and
if needed, I'll repost my message in the users list. Thanks again!
Karl Wettin wrote:
Hi Jori,
your question is better suited the java-users lists, on this list we
discuss about developing the API.
To answer your question, ngrams might solve your problem, tokenizers
are available in contrib/analyzers.
karl
5 feb 2009 kl. 10.19 skrev d-fader:
Hi,
I'm new to this list, so please don't be too harsh if I missed some
rules or something. Since about half a year I'm using Lucene and I
think it's awesome, respect for all your efforts!
Maybe the 'issue' I'm addressing now is discussed thouroughly
already, in that case I think I need some redirection to the sources
of those discussions :) Anyway, here's the thing.
For all I know it's impossible to search partial words with Lucene
(except the asterix method with e.g. the StandardAnalyzer -> ambul*
to find ambulance). My problem with that method is that my index
consists of quite a few terms. This means that if a user would search
for 'ambu amster' (ambulance amsterdam), there will be so many terms
to search, it's not doable. Now I started thinking why it's
impossible to search only a 'part' of a term or even only the 'start'
of a term and the only reason I could think of was that the Index
terms are stored tokenized (in that way you (of course) can't find
partial terms, since the index actually doesn't contain the literal
terms, but tokens instead). But Lucene can also store all terms
untokenized, so in that case a partial search would be possible in my
humble opinion, since all terms would be stored 'literally'.
Maybe my thinking is wrong, I only have a black box view of Lucene,
so I don't know much about indexing algorithm and all, but I just
want to know if this could be done or else why not :) You see, the
users of my index want to know why they can't search parts of the
words they enter and I still can't give them a really good answer,
except the 'it would result in too many OR operators in the query'
statement :)
Thanks in advance!
Jori
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org