Greetings

We have a problem whereby Lucene 4.7 occasionally does not apply a filter query during searching. The problem is intermittent. One in thirty or so searches will return what appears to be an unfiltered result set. There are no exceptions or errors occurring.. just incorrect results. We are using realtime search with multiple index readers. Our software had been working fine with earlier versions of Lucene. I've double checked the query submitted to lucene and it appears to be correct. The query looks as follows:

2014-03-28 21:16:38 t.c.s.a.s.StandardSearch [DEBUG] start search {searchquery='',query='*:*',filterQuery='QueryWrapperFilter(+archivedate:[201002280000 TO 201403282115] +cat:email +(to:"john.doug...@mycompany.com.au john.douglas mycompany.com.au john douglas mycompany com au com.au" to:"john....@mycompany.com.au john.doe mycompany.com.au john doe mycompany com au com.au" from:"john.doug...@mycompany.com.au john.douglas mycompany.com.au john douglas mycompany com au com.au" from:"john....@mycompany.com.au john.doe mycompany.com.au john doe mycompany com au com.au" cc:"john.doug...@mycompany.com.au john.douglas mycompany.com.au john douglas mycompany com au com.au" cc:"john....@mycompany.com.au john.doe mycompany.com.au john doe mycompany com au com.au"))',sort='<long: "mydate">!'}

The string "john....@mycompany.com.au john.doe mycompany.com.au john doe mycompany com au com.au" is the required expansion for the UAX29URLEmailTokenizer. By using quotes, I am aiming for an exact match. This works most of the time, but not all of the time (as it should).

I came across: https://issues.apache.org/jira/browse/LUCENE-5502 and applied it, but it makes no difference. I tried to downgrade Lucene, but it wont read the 4.6 indexes. Can anyone suggest a way forward?

Thanks for your recommendations

Jamie

-------------------------

public final class EmailAnalyzer extends StopwordAnalyzerBase {

public static final int DEFAULT_MAX_TOKEN_LENGTH = StandardAnalyzer.DEFAULT_MAX_TOKEN_LENGTH;
  private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH;
public static final CharArraySet STOP_WORDS_SET = StopAnalyzer.ENGLISH_STOP_WORDS_SET;

  public EmailAnalyzer(Version matchVersion, CharArraySet stopWords) {
    super(matchVersion, stopWords);
  }

  public EmailAnalyzer(Version matchVersion) {
    this(matchVersion, STOP_WORDS_SET);
  }

public EmailAnalyzer(Version matchVersion, Reader stopwords) throws IOException {
    this(matchVersion, loadStopwordSet(stopwords, matchVersion));
  }

  public void setMaxTokenLength(int length) {
    maxTokenLength = length;
  }

 public int getMaxTokenLength() {
    return maxTokenLength;
  }

protected TokenStreamComponents createComponents(final String fieldName, final Reader reader) { final UAX29URLEmailTokenizer src = new UAX29URLEmailTokenizer(matchVersion, reader);
    src.setMaxTokenLength(maxTokenLength);
    TokenStream tok = new EmailFilter(src);
    tok = new LowerCaseFilter(matchVersion, tok);
    return new TokenStreamComponents(src, tok) {
      protected void setReader(final Reader reader) throws IOException {
        src.setMaxTokenLength(EmailAnalyzer.this.maxTokenLength);
        super.setReader(reader);
      }
    };
  }
}



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to