I traced this to this block in FuzzyTermsEnum: if (ed == 0) { // exact match boostAtt.setBoost(1.0F); } else { final int codePointCount = UnicodeUtil.codePointCount(term); int minTermLength = Math.min(codePointCount, termLength);
float similarity = 1.0f - (float) ed / (float) minTermLength; boostAtt.setBoost(similarity); } where in your test ed (edit distance) was 2 and minTermLength 1, leading to negative boost. I don't really understand this code at all, but I wonder if it should divide by maxTermLength instead of minTermLength? On Thu, Oct 1, 2020 at 9:54 AM Juraj Jurčo <jjurco...@gmail.com> wrote: > > Hi guys, > we are trying to implement search and we have experienced a strange > situation. When our text contains an apostrophe followed by a single > character AND we our search query is composed of exactly two letters followed > by proximity search AND we use highlighting, we get an exception: > >> java.lang.IllegalArgumentException: boost must be a positive float, got -1.0 > > > It seems there is a problem at:FuzzyTermsEnum.java:271 (float similarity = > 1.0f - (float) ed / (float) minTermLength) when it reaches it with ed=2 and > it sets a negative boost. > > I was able to reproduce the error with following code: > > import java.io.IOException; > import java.nio.file.Path; > > import org.apache.commons.io.FileUtils; > import org.apache.lucene.analysis.Analyzer; > import org.apache.lucene.analysis.TokenStream; > import org.apache.lucene.analysis.core.SimpleAnalyzer; > import org.apache.lucene.document.Document; > import org.apache.lucene.document.Field; > import org.apache.lucene.document.TextField; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.index.IndexWriterConfig; > import org.apache.lucene.queryparser.classic.ParseException; > import org.apache.lucene.queryparser.classic.QueryParser; > import org.apache.lucene.search.Query; > import org.apache.lucene.search.highlight.Highlighter; > import org.apache.lucene.search.highlight.InvalidTokenOffsetsException; > import org.apache.lucene.search.highlight.QueryScorer; > import org.apache.lucene.search.highlight.SimpleHTMLFormatter; > import org.apache.lucene.search.highlight.TokenSources; > import org.apache.lucene.store.Directory; > import org.apache.lucene.store.FSDirectory; > import org.junit.jupiter.api.Test; > > class FindSqlHighlightTest { > > @Test > void reproduceHighlightProblem() throws IOException, ParseException, > InvalidTokenOffsetsException { > String text = "doesn't"; > String field = "text"; > //NOK: se~, se~2 and any higher number > //OK: sel~, s~, se~1 > String uQuery = "se~"; > int maxStartOffset = -1; > Analyzer analyzer = new SimpleAnalyzer(); > > Path indexLocation = Path.of("temp", > "reproduceHighlightProblem").toAbsolutePath(); > if (indexLocation.toFile().exists()) { > FileUtils.deleteDirectory(indexLocation.toFile()); > } > Directory indexDir = FSDirectory.open(indexLocation); > > //Create index > IndexWriterConfig dimsIndexWriterConfig = new > IndexWriterConfig(analyzer); > dimsIndexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE); > IndexWriter idxWriter = new IndexWriter(indexDir, > dimsIndexWriterConfig); > //add doc > Document doc = new Document(); > doc.add(new TextField(field, text, Field.Store.NO)); > idxWriter.addDocument(doc); > //commit > idxWriter.commit(); > idxWriter.close(); > > //search & highlight > Query query = new QueryParser(field, analyzer).parse(uQuery); > Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(), > new QueryScorer(query)); > TokenStream tokenStream = TokenSources.getTokenStream(field, null, > text, analyzer, maxStartOffset); > String highlighted = highlighter.getBestFragment(tokenStream, text); > System.out.println(highlighted); > } > } > > > Could you please confirm whether it's a bug in Lucene or whether we do > something that is not allowed? > > Thanks a lot! > Best, > Juraj+ --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org