Re: Analyzer question

Jeff Rodenburg Fri, 19 May 2006 07:59:42 -0700

The Keyword analyzer does no stemming or input modification of any sort:
think of it as WYSIWYG for index population.  The Whitespace analyzer simply
removes spaces from your input (still no stemming), but the tokens are the
individual words.  I don't have the code in front of me, so I'm not sure if
the Whitespace analyzer uses a LowerCaseTokenizer or not.  Analyzers don't
break on partial terms; they may take a term and break it down further, but
the starting point is the term itself.


The analyzer won't detect the patterns -- it's your query.  You'll want to
look at WildcardQuery and/or PhraseQuery.  Given the sequences you've listed
as matches, you'll have to experiment with indexing to meet those
scenarios.  Start there and that should give you a better idea of what you
need from your analyzer.

Hope this helps.

-- j

On 5/19/06, AsifTheManRahman <[EMAIL PROTECTED]> wrote:



I need to know how the following analyzers work:

Whitespace
Keyword

I am looking for an analyzer that will result in a hit if the string that
is
queried appears in the document being searched. For example, if I am
looking
for "A_B_C", then I want the analyzer to detect all of the following
patterns: XXXA_B_CXXX, A_B_C, A_B_CXXX, XXXA_B_C, where XXX can be any
character, string, etc.

I am using the Whitespace analyzer right now and it looks like it only
detects A_B_C when it appears in the document as a single string, i.e.
wrapped around by white spaces (which is what the Whitespace analyzer is
supposed to do, if I'm not mistaken). However, I have tried using the
Keyword Analyzer with 1.9.1 as well, but in vain. I would like to know how
exactly this analyzer would tokenize, say, the patterns mentioned above.

Besides, the KeywordAnalyzer I believe is only available in lucene 1.9.1,
but I prefer 1.4.3 in this case.

Is there any other analyzer that could probably serve my purpose? If not,
then I'm guessing I'll have to build my own. Since I am new to this kind
of
stuff, I was wondering if anyone could give me a heads-on on how/where to
start.

Thanks in advance.
--
View this message in context:
http://www.nabble.com/Analyzer+question-t1650271.html#a4469840
Sent from the Lucene - Java Users forum at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Analyzer question

Reply via email to