Charles Givre created DRILL-6519:
------------------------------------
Summary: Add String Distance and Phonetic Functions
Key: DRILL-6519
URL: https://issues.apache.org/jira/browse/DRILL-6519
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.14.0
Reporter: Charles Givre
Assignee: Charles Givre
>From a recent project, this collection of functions makes it possible to do
>fuzzy string matching as well as phonetic matching on strings.
The following functions are all phonetic functions and map text to a number or
string based on how the word sounds. For instance "Jayme" and "Jaime" have the
same soundex values and hence these functions can be used to match similar
sounding words.
* caverphone1( <string> )
* caverphone2( <string> )
* cologne_phonetic( <string> )
* dm_soundex( <string> )
* double_metaphone(<string>)
* match_rating_encoder( <string> )
* metaphone(<string>)
* nysiis( <string> )
* refined_soundex(<string>)
* soundex(<string>)
Additionally, there is the
{code:java}
sounds_like(<string1>,<string2>){code}
function which can be used to find strings that sound similar. For instance:
{code:java}
SELECT *
FROM <data>
WHERE sounds_like( last_name, 'Gretsky' )
{code}
h2. String Distance Functions
In addition to the phonetic functions, there are a series of distance functions
which measure the difference between two strings. The functions include:
* cosine_distance(<string1>,<string2>)
* fuzzy_score(<string1>,<string2>)
* hamming_distance (<string1>,<string2>)
* jaccard_distance (<string1>,<string2>)
* jaro_distance (<string1>,<string2>)
* levenshtein_distance (<string1>,<string2>)
* longest_common_substring_distance(<string1>,<string2>)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)