Re: Standard Analyzer

Karl Wettin Mon, 25 Aug 2008 02:27:28 -0700


25 aug 2008 kl. 11.14 skrev Kalani Ruwanpathirana:

Hi,

Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive.


Then you simply add a LowercaseFilter to the chain in the Analyzer:

public final class WhitespaceAnalyzer extends Analyzer {
  public TokenStream tokenStream(String fieldName, Reader reader) {
-    return new WhitespaceTokenizer(reader);
+    TokenStream ts = new WhitespaceTokenizer(reader);
+    ts = new LowercaseFilter(ts);
+    return ts;
  }

If I need to search for words like "correct?", "<html>" (it escapes<, > and
another few characters too) I need to index those kind of words.


That sounds like an XY-problem to me:
http://www.perlmonks.org/index.pl?node_id=542341

What I really was asking about is what problem it is you are trying tosolve by indexing and searching for these sort of tokens.



       karl

On Mon, Aug 25, 2008 at 1:15 PM, Karl Wettin <[EMAIL PROTECTED]>wrote:
25 aug 2008 kl. 09.19 skrev Kalani Ruwanpathirana:

Hi,
I am using StandardAnalyzer when creating the Lucene index. Itindexes theword "wo&rk" as it is but does not index the word "wo*rk" in thatmanner.Can I index such words (including * and ?) as it is? Otherwise Ihave no
way
to index and search for words like "wo*rk", you?, etc.
Try an alternative analyzer, perhaps WhitespaceAnalyzer?(StandardAnalyzerwill index wo&rk as a single term because it contains a rule tohandle names
such as AT&T.)
You should probably also explain why you need to create an indexlike this.
      karl


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Kalani Ruwanpathirana
Department of Computer Science & Engineering
University of Moratuwa



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Standard Analyzer

Reply via email to