[
https://issues.apache.org/jira/browse/DRILL-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982753#comment-14982753
]
Karol Potocki commented on DRILL-3747:
--------------------------------------
Such functionality is often required when we search through data produced by
user collaboration (i.e. street names etc. in internet datasources) or we make
search conditions based on user input (handling spelling mistakes).
Recently I needed solution like that, basic implementation is on my github:
https://github.com/k255/drill-fuzzy-search
It works on simmetrics library which recently went apache license.
> UDF for "fuzzy" string and similarity matching
> ----------------------------------------------
>
> Key: DRILL-3747
> URL: https://issues.apache.org/jira/browse/DRILL-3747
> Project: Apache Drill
> Issue Type: New Feature
> Components: Functions - Drill
> Affects Versions: Future
> Reporter: Edmon Begoli
> Priority: Minor
> Labels: features
> Fix For: Future
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> I propose implementation of string/distance or distance matching functions
> similar to what one finds in most of other databases - soundex, metaphone,
> levenshtein (and more advanced variants such as levenshtein-damerau,
> jaro-winkler, etc.).
> See fuzzystrmatch
> http://www.postgresql.org/docs/9.5/static/fuzzystrmatch.html,
> and pg_similarity http://pgsimilarity.projects.pgfoundry.org/
> for inspiration.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)