[ https://issues.apache.org/jira/browse/HIVE-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HIVE-9738 started by Alexander Pivovarov. ------------------------------------------------- > create SOUNDEX udf > ------------------ > > Key: HIVE-9738 > URL: https://issues.apache.org/jira/browse/HIVE-9738 > Project: Hive > Issue Type: Improvement > Components: UDF > Reporter: Alexander Pivovarov > Assignee: Alexander Pivovarov > Attachments: HIVE-9738.1.patch > > > Soundex is an encoding used to relate similar names, but can also be used as > a general purpose scheme to find word with similar phonemes. > The American Soundex System > The soundex code consist of the first letter of the name followed by three > digits. These three digits are determined by dropping the letters a, e, i, o, > u, h, w and y and adding three digits from the remaining letters of the name > according to the table below. There are only two additional rules. (1) If two > or more consecutive letters have the same code, they are coded as one letter. > (2) If there are an insufficient numbers of letters to make the three digits, > the remaining digits are set to zero. > Soundex Table > 1 b,f,p,v > 2 c,g,j,k,q,s,x,z > 3 d, t > 4 l > 5 m, n > 6 r > Examples: > Miller M460 > Peterson P362 > Peters P362 > Auerbach A612 > Uhrbach U612 > Moskowitz M232 > Moskovitz M213 > Implementation: > http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/language/Soundex.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)