25 aug 2008 kl. 11.14 skrev Kalani Ruwanpathirana:
Hi,
Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive.
Then you simply add a LowercaseFilter to the chain in the Analyzer:
public final class WhitespaceAnalyzer extends Analyzer {
public TokenStream tokenStream(String fieldName, Reader reader) {
- return new WhitespaceTokenizer(reader);
+ TokenStream ts = new WhitespaceTokenizer(reader);
+ ts = new LowercaseFilter(ts);
+ return ts;
}
If I need to search for words like "correct?", "<html>" (it escapes
<, > and
another few characters too) I need to index those kind of words.
That sounds like an XY-problem to me:
http://www.perlmonks.org/index.pl?node_id=542341
What I really was asking about is what problem it is you are trying to
solve by indexing and searching for these sort of tokens.
karl
On Mon, Aug 25, 2008 at 1:15 PM, Karl Wettin <[EMAIL PROTECTED]>
wrote:
25 aug 2008 kl. 09.19 skrev Kalani Ruwanpathirana:
Hi,
I am using StandardAnalyzer when creating the Lucene index. It
indexes the
word "wo&rk" as it is but does not index the word "wo*rk" in that
manner.
Can I index such words (including * and ?) as it is? Otherwise I
have no
way
to index and search for words like "wo*rk", you?, etc.
Try an alternative analyzer, perhaps WhitespaceAnalyzer?
(StandardAnalyzer
will index wo&rk as a single term because it contains a rule to
handle names
such as AT&T.)
You should probably also explain why you need to create an index
like this.
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Kalani Ruwanpathirana
Department of Computer Science & Engineering
University of Moratuwa
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]