[ https://issues.apache.org/jira/browse/HIVE-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585025#comment-13585025 ]
Krishna commented on HIVE-4053: ------------------------------- I've implemented 'Refined Soundex' algorithm using a GenericUDF and would like to share it for a review by experts as I'm a newbie. Change Details: A new java class is created: GenericUDFRefinedSoundex.java Add a entry to FunctionRegistry.java: registerGenericUDF("soundex_ref", GenericUDFRefinedSoundex.class); Both files are attached to the email. I'm planning to implement other phonetic algorithms and submit all as a single patch. I understand there are many other steps that I need to finish before a patch is ready but for now, if you could review the attached code and provide feedback, it'll be great. Here are the details of Refined Soundex algorithm: First letter is stored Subsequent letters are replaced by numbers as defined below- * B, P => 1 * F, V => 2 * C, K, S => 3 * G, J => 4 * Q, X, Z => 5 * D, T => 6 * L => 7 * M, N => 8 * R => 9 * Other letters => 0 Consecutive letters belonging to the same group are replaced by one letter Example: > SELECT soundex_ref('Carren') FROM src LIMIT 1; > C30908 > Add support for phonetic algorithms in Hive > ------------------------------------------- > > Key: HIVE-4053 > URL: https://issues.apache.org/jira/browse/HIVE-4053 > Project: Hive > Issue Type: New Feature > Components: UDF > Reporter: Krishna > Attachments: FunctionRegistry.java, GenericUDFRefinedSoundex.java > > > Following phonetic algorithms should be considered, which are very useful in > search: > Soundex > Refined Soundex > Daitch–Mokotoff Soundex > Metaphone and Double Metaphone > New York State Identification and Intelligence System (NYSIIS) > Caverphone -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira