[
https://issues.apache.org/jira/browse/HIVE-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexander Pivovarov updated HIVE-9738:
--------------------------------------
Attachment: HIVE-9738.2.patch
patch #2
> create SOUNDEX udf
> ------------------
>
> Key: HIVE-9738
> URL: https://issues.apache.org/jira/browse/HIVE-9738
> Project: Hive
> Issue Type: Improvement
> Components: UDF
> Reporter: Alexander Pivovarov
> Assignee: Alexander Pivovarov
> Attachments: HIVE-9738.1.patch, HIVE-9738.2.patch
>
>
> Soundex is an encoding used to relate similar names, but can also be used as
> a general purpose scheme to find word with similar phonemes.
> The American Soundex System
> The soundex code consist of the first letter of the name followed by three
> digits. These three digits are determined by dropping the letters a, e, i, o,
> u, h, w and y and adding three digits from the remaining letters of the name
> according to the table below. There are only two additional rules. (1) If two
> or more consecutive letters have the same code, they are coded as one letter.
> (2) If there are an insufficient numbers of letters to make the three digits,
> the remaining digits are set to zero.
> Soundex Table
> 1 b,f,p,v
> 2 c,g,j,k,q,s,x,z
> 3 d, t
> 4 l
> 5 m, n
> 6 r
> Examples:
> Miller M460
> Peterson P362
> Peters P362
> Auerbach A612
> Uhrbach U612
> Moskowitz M232
> Moskovitz M213
> Implementation:
> http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/Soundex.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)