Norbert Luksa created IMPALA-8752:
-------------------------------------
Summary: Add Jaro-winkler edit distance and similarity built-in
function
Key: IMPALA-8752
URL: https://issues.apache.org/jira/browse/IMPALA-8752
Project: IMPALA
Issue Type: New Feature
Reporter: Norbert Luksa
Assignee: Norbert Luksa
References:
* [Apache commons - JaroWinklerDistance
|[https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/similarity/JaroWinklerDistance.html]]
* [Apache commons - JaroWinklerSimilarity
|[https://commons.apache.org/proper/commons-text/apidocs/org/apache/commons/text/similarity/JaroWinklerSimilarity.html]]
* [Oracle -
JARO_WINKLER[_SIMILARITY]|[https://oracle-base.com/articles/11g/utl_match-string-matching-in-oracle]]
Notable difference:
* With similarity, the Oracle version returns a normalized result ranging from
0 to 100.
* In the Appache version, null values result in exceptions.
* Apache rounds the values to two digitsĀ
The scaling factor of the algorithm can be added as an extra/default argument.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)