25 aug 2008 kl. 11.14 skrev Kalani Ruwanpathirana:

Hi,

Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive.

Then you simply add a LowercaseFilter to the chain in the Analyzer:

public final class WhitespaceAnalyzer extends Analyzer {
  public TokenStream tokenStream(String fieldName, Reader reader) {
-    return new WhitespaceTokenizer(reader);
+    TokenStream ts = new WhitespaceTokenizer(reader);
+    ts = new LowercaseFilter(ts);
+    return ts;
  }


If I need to search for words like "correct?", "<html>" (it escapes <, > and
another few characters too) I need to index those kind of words.

That sounds like an XY-problem to me:
http://www.perlmonks.org/index.pl?node_id=542341

What I really was asking about is what problem it is you are trying to solve by indexing and searching for these sort of tokens.


       karl




On Mon, Aug 25, 2008 at 1:15 PM, Karl Wettin <[EMAIL PROTECTED]> wrote:


25 aug 2008 kl. 09.19 skrev Kalani Ruwanpathirana:

Hi,

I am using StandardAnalyzer when creating the Lucene index. It indexes the word "wo&rk" as it is but does not index the word "wo*rk" in that manner. Can I index such words (including * and ?) as it is? Otherwise I have no
way
to index and search for words like "wo*rk", you?, etc.



Try an alternative analyzer, perhaps WhitespaceAnalyzer? (StandardAnalyzer will index wo&rk as a single term because it contains a rule to handle names
such as AT&T.)

You should probably also explain why you need to create an index like this.



      karl


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
Kalani Ruwanpathirana
Department of Computer Science & Engineering
University of Moratuwa


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to