Re: Highlight with Proximity search throws an exception

Michael Sokolov Thu, 01 Oct 2020 09:48:52 -0700

I traced this to this block in FuzzyTermsEnum:

    if (ed == 0) { // exact match
      boostAtt.setBoost(1.0F);
    } else {
      final int codePointCount = UnicodeUtil.codePointCount(term);
      int minTermLength = Math.min(codePointCount, termLength);


      float similarity = 1.0f - (float) ed / (float) minTermLength;
      boostAtt.setBoost(similarity);
    }

where in your test ed (edit distance) was 2 and minTermLength 1,
leading to negative boost.

I don't really understand this code at all, but I wonder if it should
divide by maxTermLength instead of minTermLength?

On Thu, Oct 1, 2020 at 9:54 AM Juraj Jurčo <jjurco...@gmail.com> wrote:
>
> Hi guys,
> we are trying to implement search and we have experienced a strange 
> situation. When our text contains an apostrophe followed by a single 
> character AND we our search query is composed of exactly two letters followed 
> by proximity search AND we use highlighting, we get an exception:
>
>> java.lang.IllegalArgumentException: boost must be a positive float, got -1.0
>
>
> It seems there is a problem at:FuzzyTermsEnum.java:271 (float similarity = 
> 1.0f - (float) ed / (float) minTermLength) when it reaches it with ed=2 and 
> it sets a negative boost.
>
> I was able to reproduce the error with following code:
>
> import java.io.IOException;
> import java.nio.file.Path;
>
> import org.apache.commons.io.FileUtils;
> import org.apache.lucene.analysis.Analyzer;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.core.SimpleAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.document.TextField;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.queryparser.classic.ParseException;
> import org.apache.lucene.queryparser.classic.QueryParser;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.highlight.Highlighter;
> import org.apache.lucene.search.highlight.InvalidTokenOffsetsException;
> import org.apache.lucene.search.highlight.QueryScorer;
> import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
> import org.apache.lucene.search.highlight.TokenSources;
> import org.apache.lucene.store.Directory;
> import org.apache.lucene.store.FSDirectory;
> import org.junit.jupiter.api.Test;
>
> class FindSqlHighlightTest {
>
>    @Test
>    void reproduceHighlightProblem() throws IOException, ParseException, 
> InvalidTokenOffsetsException {
>       String text = "doesn't";
>       String field = "text";
>       //NOK: se~, se~2 and any higher number
>       //OK: sel~, s~, se~1
>       String uQuery = "se~";
>       int maxStartOffset = -1;
>       Analyzer analyzer = new SimpleAnalyzer();
>
>       Path indexLocation = Path.of("temp", 
> "reproduceHighlightProblem").toAbsolutePath();
>       if (indexLocation.toFile().exists()) {
>          FileUtils.deleteDirectory(indexLocation.toFile());
>       }
>       Directory indexDir = FSDirectory.open(indexLocation);
>
>       //Create index
>       IndexWriterConfig dimsIndexWriterConfig = new 
> IndexWriterConfig(analyzer);
>       dimsIndexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
>       IndexWriter idxWriter = new IndexWriter(indexDir, 
> dimsIndexWriterConfig);
>       //add doc
>       Document doc = new Document();
>       doc.add(new TextField(field, text, Field.Store.NO));
>       idxWriter.addDocument(doc);
>       //commit
>       idxWriter.commit();
>       idxWriter.close();
>
>       //search & highlight
>       Query query = new QueryParser(field, analyzer).parse(uQuery);
>       Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(), 
> new QueryScorer(query));
>       TokenStream tokenStream = TokenSources.getTokenStream(field, null, 
> text, analyzer, maxStartOffset);
>       String highlighted = highlighter.getBestFragment(tokenStream, text);
>       System.out.println(highlighted);
>    }
> }
>
>
> Could you please confirm whether it's a bug in Lucene or whether we do 
> something that is not allowed?
>
> Thanks a lot!
> Best,
> Juraj+

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Highlight with Proximity search throws an exception

Reply via email to