[ 
https://issues.apache.org/jira/browse/LUCENE-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3846:
------------------------------------

    Attachment: LUCENE-3846.patch

here is a patch that adds the missing intersect method and adds several tests 
derived from the AnalyzingSuggestorTest. The tests all pass at this point but I 
do get a weird failure if I run the benchmarks. somehow the TopNSearcher runs 
into a bad state which I can't really figure out.

this patch has several refactorings in AnalyzingSuggestor mainly to make 
testing easier in the fuzzy case (encapuslated some stuff into package private 
methods etc.) Yet there are tons of nocommits but at least we have something 
working. 

Regarding the failure, I see a NoSuchELementException from the "queue" in the 
top N searcher that somehow removed the bottom and tries to pull the last 
element that doesn't exists. (stacktrace below) Yet, the funky thing is that 
this doesn't happen if I run this with exactFirst=false but the problem seems 
to be in the non-exactFirst part (see stacktrace). I use a direct intersection 
for exactFirst in the fuzzy case so that code is identical to analyzing 
suggestor since the intersection of the LD automaton doesn't return enough 
information to tell what is an exact match. 

here is the stacktrace:

{code}
java.util.NoSuchElementException
        at java.util.TreeMap.key(TreeMap.java:1206)
        at java.util.TreeMap.lastKey(TreeMap.java:274)
        at java.util.TreeSet.last(TreeSet.java:384)
        at 
org.apache.lucene.util.fst.Util$TopNSearcher.addIfCompetitive(Util.java:339)
        at org.apache.lucene.util.fst.Util$TopNSearcher.search(Util.java:453)
        at 
org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.lookup(AnalyzingSuggester.java:581)
        at 
org.apache.lucene.search.suggest.LookupBenchmarkTest$2.call(LookupBenchmarkTest.java:228)
        at 
org.apache.lucene.search.suggest.LookupBenchmarkTest$2.call(LookupBenchmarkTest.java:1)
        at 
org.apache.lucene.search.suggest.LookupBenchmarkTest.measure(LookupBenchmarkTest.java:253)
        at 
org.apache.lucene.search.suggest.LookupBenchmarkTest.runPerformanceTest(LookupBenchmarkTest.java:224)
        at 
org.apache.lucene.search.suggest.LookupBenchmarkTest.testPerformanceOnPrefixes6_9(LookupBenchmarkTest.java:192)
NOTE: reproduce with: ant test  -Dtestcase=LookupBenchmarkTest 
-Dtests.method=testPerformanceOnPrefixes6_9 -Dtests.seed=B5BAF2A9592263BC 
-Dtests.locale=fi_FI -Dtests.timezone=Africa/Lagos -Dtests.file.encoding=UTF-8
NOTE: test params are: codec=Lucene40: {}, sim=DefaultSimilarity, locale=fi_FI, 
timezone=Africa/Lagos
NOTE: Linux 2.6.38-16-generic amd64/Sun Microsystems Inc. 1.6.0_26 
(64-bit)/cpus=12,threads=1,free=578809008,total=1539571712
NOTE: All tests run in this JVM: [LookupBenchmarkTest]

{code}

mike if you get a chance it would be great if you could look into that one?!

                
> Fuzzy suggester
> ---------------
>
>                 Key: LUCENE-3846
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3846
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.1
>
>         Attachments: LUCENE-3846_fuzzy_analyzing.patch, LUCENE-3846.patch, 
> LUCENE-3846.patch, LUCENE-3846.patch
>
>
> Would be nice to have a suggester that can handle some fuzziness (like spell 
> correction) so that it's able to suggest completions that are "near" what you 
> typed.
> As a first go at this, I implemented 1T (ie up to 1 edit, including a 
> transposition), except the first letter must be correct.
> But there is a penalty, ie, the "corrected" suggestion needs to have a much 
> higher freq than the "exact match" suggestion before it can compete.
> Still tons of nocommits, and somehow we should merge this / make it work with 
> analyzing suggester too (LUCENE-3842).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to