Hi,

This is what I've tried:
https://gist.github.com/anonymous/7383104

So far so good except that something is definitely wrong in my code as the
synonym is not emitted as a valid token it seems. This is how my indexing
analyzer is built:

 private static final class MyIndexAnalyzer extends ReusableAnalyzerBase {
        @Override
        protected TokenStreamComponents createComponents(String fieldName,
Reader reader) {
            final Tokenizer tokenizer = new
WhitespaceTokenizer(Version.LUCENE_36, reader);
            final TwitterFilter twitterFilter = new
TwitterFilter(tokenizer);
            final LowerCaseFilter filter = new
LowerCaseFilter(Version.LUCENE_36, twitterFilter);
            return new TokenStreamComponents(tokenizer, filter);
        }
    }

I am expecting the lower filter to pick up the synonym exactly the same way
as the original token but it does not. If I have the following tweet "Bla
Bla #SomeTAG", "#sometag" matches but "sometag" does not. All other use
cases not involving a case mismatch work as expected.

Does anyone knows what's wrong in my code?

Thanks for the support!

S.



On Tue, Nov 5, 2013 at 2:17 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> If your universe of items you want to match this way is small,
> consider something akin to synonyms. Your indexing process
> emits two tokens, with and without the @ or # which should
> cover your situation.
>
> FWIW,
> Erick
>
>
> On Tue, Nov 5, 2013 at 2:40 AM, Stéphane Nicoll
> <stephane.nic...@gmail.com>wrote:
>
> > Hi,
> >
> > I am building an application that indexes tweet and offer some basic
> > search facilities on them.
> >
> > I am trying to find a combination where the following would work:
> >
> > * foo matches the foo word, a mention (@foo) or the hashtag (#foo)
> > * @foo only matches the mention
> > * #foo matches only the hashtag
> >
> > It should matches complete word so I used the WhiteSpaceAnalyzer for
> > indexing.
> >
> > Any recommendation for this use case?
> >
> > Thanks !
> > S.
> >
> > Sent from my iPhone
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>

Reply via email to