[ https://issues.apache.org/jira/browse/LUCENE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661302#action_12661302 ]
Otis Gospodnetic commented on LUCENE-1513: ------------------------------------------ I feel like I missed some FastSS discussion on the list.... was there one? I took a quick look at the paper and the code. Is the following the general idea: # index "fuzzy"/"misspelled" terms in addition to the normal terms (=> larger index, slower indexing). How much fuzziness one wants to allow or handle is decided at index time. # rewrite the query to include variations/misspellings of each terms and use that to search (=> more clauses, slower than normal search, but faster than the "normal" fuzzy query whose speed depends on the number of indexed terms) ? Quick code comments: * Need to add ASL * Need to replace tabs with 2 spaces and formatting in FuzzyHitCollector * No @author * Unit test if possible * Should FastSSwC not be able to take a variable K? * Should variables named after types (e.g. "set" in public static String getNeighborhoodString(Set<String> set) { ) be renamed, so they describe what's in them instead? (easier to understand API?) > fastss fuzzyquery > ----------------- > > Key: LUCENE-1513 > URL: https://issues.apache.org/jira/browse/LUCENE-1513 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* > Reporter: Robert Muir > Priority: Minor > Attachments: fastSSfuzzy.zip > > > code for doing fuzzyqueries with fastssWC algorithm. > FuzzyIndexer: given a lucene field, it enumerates all terms and creates an > auxiliary offline index for fuzzy queries. > FastFuzzyQuery: similar to fuzzy query except it queries the auxiliary index > to retrieve a candidate list. this list is then verified with levenstein > algorithm. > sorry but the code is a bit messy... what I'm actually using is very > different from this so its pretty much untested. but at least you can see > whats going on or fix it up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org