Replying to self: silly me. I am obviously creating the array with the wrong length. final String term = new String(buffer, 1, length);
should be replaced by final String term = new String(buffer, 1, length -1); and the silly trim can go away. I guess I need more coffee. S. On Sat, Nov 9, 2013 at 9:45 AM, Stephane Nicoll <stephane.nic...@gmail.com>wrote: > Hi, > > This is what I've tried: > https://gist.github.com/anonymous/7383104 > > So far so good except that something is definitely wrong in my code as the > synonym is not emitted as a valid token it seems. This is how my indexing > analyzer is built: > > private static final class MyIndexAnalyzer extends ReusableAnalyzerBase { > @Override > protected TokenStreamComponents createComponents(String fieldName, > Reader reader) { > final Tokenizer tokenizer = new > WhitespaceTokenizer(Version.LUCENE_36, reader); > final TwitterFilter twitterFilter = new > TwitterFilter(tokenizer); > final LowerCaseFilter filter = new > LowerCaseFilter(Version.LUCENE_36, twitterFilter); > return new TokenStreamComponents(tokenizer, filter); > } > } > > I am expecting the lower filter to pick up the synonym exactly the same > way as the original token but it does not. If I have the following tweet > "Bla Bla #SomeTAG", "#sometag" matches but "sometag" does not. All other > use cases not involving a case mismatch work as expected. > > Does anyone knows what's wrong in my code? > > Thanks for the support! > > S. > > > > On Tue, Nov 5, 2013 at 2:17 PM, Erick Erickson <erickerick...@gmail.com>wrote: > >> If your universe of items you want to match this way is small, >> consider something akin to synonyms. Your indexing process >> emits two tokens, with and without the @ or # which should >> cover your situation. >> >> FWIW, >> Erick >> >> >> On Tue, Nov 5, 2013 at 2:40 AM, Stéphane Nicoll >> <stephane.nic...@gmail.com>wrote: >> >> > Hi, >> > >> > I am building an application that indexes tweet and offer some basic >> > search facilities on them. >> > >> > I am trying to find a combination where the following would work: >> > >> > * foo matches the foo word, a mention (@foo) or the hashtag (#foo) >> > * @foo only matches the mention >> > * #foo matches only the hashtag >> > >> > It should matches complete word so I used the WhiteSpaceAnalyzer for >> > indexing. >> > >> > Any recommendation for this use case? >> > >> > Thanks ! >> > S. >> > >> > Sent from my iPhone >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> > >