[ 
https://issues.apache.org/jira/browse/LUCENE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661302#action_12661302
 ] 

Otis Gospodnetic commented on LUCENE-1513:
------------------------------------------

I feel like I missed some FastSS discussion on the list.... was there one?

I took a quick look at the paper and the code.  Is the following the general 
idea:
# index "fuzzy"/"misspelled" terms in addition to the normal terms (=> larger 
index, slower indexing).  How much fuzziness one wants to allow or handle is 
decided at index time.
# rewrite the query to include variations/misspellings of each terms and use 
that to search (=> more clauses, slower than normal search, but faster than the 
"normal" fuzzy query whose speed depends on the number of indexed terms)
?

Quick code comments:
* Need to add ASL
* Need to replace tabs with 2 spaces and formatting in FuzzyHitCollector
* No @author
* Unit test if possible
* Should FastSSwC not be able to take a variable K?
* Should variables named after types (e.g. "set" in public static String 
getNeighborhoodString(Set<String> set) { ) be renamed, so they describe what's 
in them instead? (easier to understand API?)


> fastss fuzzyquery
> -----------------
>
>                 Key: LUCENE-1513
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1513
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>            Reporter: Robert Muir
>            Priority: Minor
>         Attachments: fastSSfuzzy.zip
>
>
> code for doing fuzzyqueries with fastssWC algorithm.
> FuzzyIndexer: given a lucene field, it enumerates all terms and creates an 
> auxiliary offline index for fuzzy queries.
> FastFuzzyQuery: similar to fuzzy query except it queries the auxiliary index 
> to retrieve a candidate list. this list is then verified with levenstein 
> algorithm.
> sorry but the code is a bit messy... what I'm actually using is very 
> different from this so its pretty much untested. but at least you can see 
> whats going on or fix it up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to