[
https://issues.apache.org/jira/browse/LUCENE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661302#action_12661302
]
Otis Gospodnetic commented on LUCENE-1513:
------------------------------------------
I feel like I missed some FastSS discussion on the list.... was there one?
I took a quick look at the paper and the code. Is the following the general
idea:
# index "fuzzy"/"misspelled" terms in addition to the normal terms (=> larger
index, slower indexing). How much fuzziness one wants to allow or handle is
decided at index time.
# rewrite the query to include variations/misspellings of each terms and use
that to search (=> more clauses, slower than normal search, but faster than the
"normal" fuzzy query whose speed depends on the number of indexed terms)
?
Quick code comments:
* Need to add ASL
* Need to replace tabs with 2 spaces and formatting in FuzzyHitCollector
* No @author
* Unit test if possible
* Should FastSSwC not be able to take a variable K?
* Should variables named after types (e.g. "set" in public static String
getNeighborhoodString(Set<String> set) { ) be renamed, so they describe what's
in them instead? (easier to understand API?)
> fastss fuzzyquery
> -----------------
>
> Key: LUCENE-1513
> URL: https://issues.apache.org/jira/browse/LUCENE-1513
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/*
> Reporter: Robert Muir
> Priority: Minor
> Attachments: fastSSfuzzy.zip
>
>
> code for doing fuzzyqueries with fastssWC algorithm.
> FuzzyIndexer: given a lucene field, it enumerates all terms and creates an
> auxiliary offline index for fuzzy queries.
> FastFuzzyQuery: similar to fuzzy query except it queries the auxiliary index
> to retrieve a candidate list. this list is then verified with levenstein
> algorithm.
> sorry but the code is a bit messy... what I'm actually using is very
> different from this so its pretty much untested. but at least you can see
> whats going on or fix it up.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]