[
https://issues.apache.org/jira/browse/DRILL-6519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arina Ielchiieva reassigned DRILL-6519:
---------------------------------------
Assignee: Arina Ielchiieva (was: Charles Givre)
> Add String Distance and Phonetic Functions
> ------------------------------------------
>
> Key: DRILL-6519
> URL: https://issues.apache.org/jira/browse/DRILL-6519
> Project: Apache Drill
> Issue Type: Improvement
> Reporter: Charles Givre
> Assignee: Arina Ielchiieva
> Priority: Major
> Labels: doc-impacting
> Fix For: 1.14.0
>
>
> From a recent project, this collection of functions makes it possible to do
> fuzzy string matching as well as phonetic matching on strings.
>
> The following functions are all phonetic functions and map text to a number
> or string based on how the word sounds. For instance "Jayme" and "Jaime"
> have the same soundex values and hence these functions can be used to match
> similar sounding words.
> * caverphone1( <string> )
> * caverphone2( <string> )
> * cologne_phonetic( <string> )
> * dm_soundex( <string> )
> * double_metaphone(<string>)
> * match_rating_encoder( <string> )
> * metaphone(<string>)
> * nysiis( <string> )
> * refined_soundex(<string>)
> * soundex(<string>)
> Additionally, there is the
> {code:java}
> sounds_like(<string1>,<string2>){code}
> function which can be used to find strings that sound similar. For instance:
>
> {code:java}
> SELECT *
> FROM <data>
> WHERE sounds_like( last_name, 'Gretsky' )
> {code}
> h2. String Distance Functions
> In addition to the phonetic functions, there are a series of distance
> functions which measure the difference between two strings. The functions
> include:
> * cosine_distance(<string1>,<string2>)
> * fuzzy_score(<string1>,<string2>)
> * hamming_distance (<string1>,<string2>)
> * jaccard_distance (<string1>,<string2>)
> * jaro_distance (<string1>,<string2>)
> * levenshtein_distance (<string1>,<string2>)
> * longest_common_substring_distance(<string1>,<string2>)
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)