[ https://issues.apache.org/jira/browse/DRILL-6519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548584#comment-16548584 ]
Bridget Bevens commented on DRILL-6519: --------------------------------------- Hi [~cgivre] I've added the content for the phonetic functions [here|[https://drill.apache.org/docs/phonetic-functions/].] I've added the content for the string distance functions [here|[https://drill.apache.org/docs/string-distance-functions/].] I did not see the sounds_like(<string1>,<string2>) function in the changed files of the pull request, so I did not add it to the doc. Please let me know if this function is supported so I can add it to the doc. I'm setting the doc label to doc-complete, but will make any changes you suggest if you have feedback for me. Thanks, Bridget > Add String Distance and Phonetic Functions > ------------------------------------------ > > Key: DRILL-6519 > URL: https://issues.apache.org/jira/browse/DRILL-6519 > Project: Apache Drill > Issue Type: Improvement > Reporter: Charles Givre > Assignee: Charles Givre > Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.14.0 > > > From a recent project, this collection of functions makes it possible to do > fuzzy string matching as well as phonetic matching on strings. > > The following functions are all phonetic functions and map text to a number > or string based on how the word sounds. For instance "Jayme" and "Jaime" > have the same soundex values and hence these functions can be used to match > similar sounding words. > * caverphone1( <string> ) > * caverphone2( <string> ) > * cologne_phonetic( <string> ) > * dm_soundex( <string> ) > * double_metaphone(<string>) > * match_rating_encoder( <string> ) > * metaphone(<string>) > * nysiis( <string> ) > * refined_soundex(<string>) > * soundex(<string>) > Additionally, there is the > {code:java} > sounds_like(<string1>,<string2>){code} > function which can be used to find strings that sound similar. For instance: > > {code:java} > SELECT * > FROM <data> > WHERE sounds_like( last_name, 'Gretsky' ) > {code} > h2. String Distance Functions > In addition to the phonetic functions, there are a series of distance > functions which measure the difference between two strings. The functions > include: > * cosine_distance(<string1>,<string2>) > * fuzzy_score(<string1>,<string2>) > * hamming_distance (<string1>,<string2>) > * jaccard_distance (<string1>,<string2>) > * jaro_distance (<string1>,<string2>) > * levenshtein_distance (<string1>,<string2>) > * longest_common_substring_distance(<string1>,<string2>) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)