[drill] branch gh-pages updated: Reformat string-distance-functions.md.

dzamo Thu, 08 Jul 2021 01:38:59 -0700

This is an automated email from the ASF dual-hosted git repository.

dzamo pushed a commit to branch gh-pages
in repository https://gitbox.apache.org/repos/asf/drill.git



The following commit(s) were added to refs/heads/gh-pages by this push:
     new 8619175  Reformat string-distance-functions.md.
8619175 is described below

commit 8619175ee4fccf3c116b2fb87f5b59d49da078fc
Author: James Turton <[email protected]>
AuthorDate: Thu Jul 8 10:16:26 2021 +0200

    Reformat string-distance-functions.md.
---
 .../050-aggregate-and-aggregate-statistical.md     |  2 -
 .../sql-functions/062-string-distance-functions.md | 63 ++++++----------------
 2 files changed, 15 insertions(+), 50 deletions(-)

diff --git 
a/_docs/en/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md
 
b/_docs/en/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md
index 6a199a5..7690aab 100644
--- 
a/_docs/en/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md
+++ 
b/_docs/en/sql-reference/sql-functions/050-aggregate-and-aggregate-statistical.md
@@ -9,7 +9,6 @@ parent: "SQL Functions"
 The following table lists the aggregate functions that you can use in Drill
 queries.
 
-|\-------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|
 
 | **Function**                                                 | **Argument 
Type**                                                                          
                                               | **Return Type**                
                                                                                
                      |
 | ------------------------------------------------------------ | 
-----------------------------------------------------------------------------------------------------------------------------------------
 | 
------------------------------------------------------------------------------------------------------------------------------------
 |
@@ -21,7 +20,6 @@ queries.
 | COUNT(\[DISTINCT\] expression)                               | any           
                                                                                
                                            | BIGINT                            
                                                                                
                   |
 | MAX(expression), MIN(expression)                             | BINARY, 
DECIMAL, VARCHAR, DATE, TIME, or TIMESTAMP                                      
                                                  | Same   as argument type     
                                                                                
                         |
 | SUM(expression)                                              | SMALLINT, 
INTEGER, BIGINT, FLOAT, DOUBLE, DECIMAL, INTERVAL                               
                                                | DECIMAL for DECIMAL   
argument,     BIGINT for any integer-type argument (including BIGINT), DOUBLE 
for   floating-point arguments   |
-| -------------------------------------------                  | 
-----------------------------------------------------------------------------------------------------------------------------------------
 | 
------------------------------------------------------------------------------------------------------------------------------------
 |
 
 - Drill 1.14 and later supports the ANY_VALUE function.
 - Starting in Drill 1.14, the DECIMAL data type is enabled by default.
diff --git 
a/_docs/en/sql-reference/sql-functions/062-string-distance-functions.md 
b/_docs/en/sql-reference/sql-functions/062-string-distance-functions.md
index c0d0c3f..09f069c 100644
--- a/_docs/en/sql-reference/sql-functions/062-string-distance-functions.md
+++ b/_docs/en/sql-reference/sql-functions/062-string-distance-functions.md
@@ -4,7 +4,9 @@ slug: "String Distance Functions"
 parent: "SQL Functions"
 ---
 
-Starting in version 1.14, Drill supports string distance functions. Typically, 
you use string distance functions in the WHERE clause of a query to measure the 
difference between two strings. For example, if you want to match a street 
address, but do not know how to spell a street name, you could issue a query on 
the data source with the street addresses:
+**Introduced in release**: 1.14.
+
+Drill provides a functions for calculating a variety well known string 
distance metrics.  Typically, you use string distance functions in the WHERE 
clause of a query to measure the difference between two strings.  For example, 
if you want to match a street address, but do not know how to spell a street 
name, you could execute a query on the data source with the street addresses:
 
        SELECT street_address
        FROM address-data
@@ -15,57 +17,22 @@ The search would return addresses from rows with street 
addresses similar to 123
        1234 N. Quail Lane
        1234 N Quaile Lan  
 
-Drill supports the following string distance functions:   
-
-- 
[`cosine_distance(string1,string2)`]({{site.baseurl}}/docs/string-distance-functions/#cosine_distance(string1,string2))
-- 
[`fuzzy_score(string1,string2)`]({{site.baseurl}}/docs/string-distance-functions/#fuzzy_score(string1,string2))
-- 
[`hamming_distance(string1,string2)`]({{site.baseurl}}/docs/string-distance-functions/#hamming_distance-(string1,string2))
-- 
[`jaccard_distance(string1,string2)`]({{site.baseurl}}/docs/string-distance-functions/#jaccard_distance-(string1,string2))
-- 
[`jaro_distance(string1,string2)`]({{site.baseurl}}/docs/string-distance-functions/#jaro_distance-(string1,string2))
-- 
[`levenshtein_distance(string1,string2)`]({{site.baseurl}}/docs/string-distance-functions/#levenshtein_distance-(string1,string2))
-- 
[`longest_common_substring_distance(string1,string2)`]({{site.baseurl}}/docs/string-distance-functions/#longest_common_substring_distance(string1,string2))
  
-
-
-## Function Descriptions  
-The following sections describe each of the string distance functions that 
Drill supports.   
-
-### cosine_distance(string1,string2)  
- 
-Calculates the cosine distance between two strings.  
-
-
-### fuzzy_score(string1,string2)  
-
-Calculates the cosine distance between two strings. A matching algorithm that 
is similar to the searching algorithms implemented in editors such as Sublime 
Text, TextMate, Atom, and others. One point is given for every matched 
character. Subsequent matches yield two bonus points. A higher score indicates 
a higher similarity. 
-       
-
-### hamming_distance (string1,string2)  
-
-The hamming distance between two strings of equal length is the number of 
positions at which the corresponding symbols are different. For further 
explanation about the Hamming Distance, refer to 
http://en.wikipedia.org/wiki/Hamming_distance.   
-
-
-### jaccard_distance (string1,string2)  
-
-Measures the Jaccard distance of two sets of character sequence. [Jaccard 
distance](https://en.wikipedia.org/wiki/Jaccard_index) is the dissimilarity 
between two sets. It is the complementary of Jaccard similarity.   
-
-
-### jaro_distance (string1,string2)
-
-A similarity algorithm indicating the percentage of matched characters between 
two character sequences. The Jaro measure is the weighted sum of percentage of 
matched characters from each file and transposed characters. Winkler increased 
this measure for matching initial characters. This implementation is based on 
the [Jaro Winkler similarity 
algorithm](https://en.wikipedia.org/wiki/Jaro–Winkler_distance).  
-
-
-### levenshtein_distance (string1,string2)
-An algorithm for measuring the difference between two character sequences. 
This is the number of changes needed to change one sequence into another, where 
each change is a single character modification (deletion, insertion, or 
substitution).
+Drill supports the following string distance functions.
 
+|Function|Return type|Description|
+|-|-|-|
+|COSINE_DISTANCE(string1, string2)|FLOAT8|Returns the cosine distance, a 
measurement of the angular distance between between two strings regarded as 
word vectors.|
+|FUZZY_SCORE(string1, string2)|FLOAT8|Returns the score from a fuzzy string 
matching algorithm[^1].  Higher scores indicate greater similarity.|
+|HAMMING_DISTANCE(string1, string2)|FLOAT8|Returns the [Hamming 
distance](http://en.wikipedia.org/wiki/Hamming_distance) between two strings of 
equal length, a measurement of the number of positions at which corresponding 
characters differ.|
+|JACCARD_DISTANCE(string1, string2)|FLOAT8|Returns the [Jaccard 
distance](https://en.wikipedia.org/wiki/Jaccard_index) between two strings 
regarded as unordered sets of characters, a measurement of the overlap between 
two sets.|
+|JARO_DISTANCE(string1, string2)|FLOAT8|Returns the [Jaro-Winkler 
distance](https://en.wikipedia.org/wiki/Jaro–Winkler_distance), a measurement 
of the fraction of matching characters between two strings.|
+|LEVENSHTEIN_DISTANCE(string1, string2)|FLOAT8|Returns the [Levenshtein 
distance](https://en.wikipedia.org/wiki/Levenshtein_distance) between two 
strings, a measurement of the number of single character modifications needed 
change one string into another.|
+|LONGEST\_COMMON\_SUBSTRING_DISTANCE(string1, string2)|FLOAT8|Returns the 
length of the [longest common 
substring](https://en.wikipedia.org/wiki/Longest_common_subsequence_problem) 
across two strings[^2].|
 
-### longest\_common\_substring_distance(string1,string2)  
 
-Returns the length of the longest sub-sequence that two strings have in common.
-Two strings that are entirely different, return a value of 0, and two strings 
that return a value of the commonly shared length implies that the strings are 
completely the same in value and position. This implementation is based on the 
[Longest Commons Substring 
algorithm](https://en.wikipedia.org/wiki/Longest_common_subsequence_problem).  
- 
+[^1]: Calculates the score from a matching algorithm similar to the searching 
algorithms implemented in editors such as Sublime Text, TextMate, Atom, and 
others.  One point is given for every matched character.  Subsequent matches 
yield two bonus points.
 
-**Note:** Generally this algorithm is fairly inefficient, as for length m, n 
of the input
-CharSequence's left and right respectively, the runtime of the algorithm is 
O(m*n).  
+[^2]: Generally this algorithm is fairly inefficient, as for length m, n of 
the input CharSequence's left and right respectively, the runtime of the 
algorithm is O(m*n).

[drill] branch gh-pages updated: Reformat string-distance-functions.md.

Reply via email to