[ https://issues.apache.org/jira/browse/HADOOP-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644461#action_12644461 ]
Zheng Shao commented on HADOOP-4569: ------------------------------------ The old syntax for doing that was: FROM ( FROM pv_users SELECT TRANSFORM(pv_users.userid, pv_users.date) AS(key, value) USING 'map_script' CLUSTER BY key ) map_output INSERT OVERWRITE TABLE pv_users_reduced SELECT TRANSFORM(map_output.key, map_output.value) AS (date, count) USING 'reduce_script'; We plan to change that to: FROM ( FROM pv_users MAP pv_users.userid, pv_users.date USING 'map_script' AS key, value CLUSTER BY key ) map_output INSERT OVERWRITE TABLE pv_users_reduced REDUCE map_output.key, map_output.value USING 'reduce_script' AS date, count; The script is expected to read tab-separated fields, and also generate tab-separated fields. The major changes are: • Schemaless Mapper/Reducer: if there is "AS" we assume "AS key,value" which takes the bytes before the first tab into key, and the rest to value. • SELECT TRANSFORM changed to MAP/REDUCE to make it clear what is map and what is reduce. • Reordered USING and AS to make it clearer. * Support different shuffling/sorting keys by using "DISTRIBUTE BY" and "SORT BY" ("CLUSTER BY key" means "DISTRIBUTE BY key SORT BY key ASC") > Hive: new syntax for specifying custom map/reduce scripts > --------------------------------------------------------- > > Key: HADOOP-4569 > URL: https://issues.apache.org/jira/browse/HADOOP-4569 > Project: Hadoop Core > Issue Type: Improvement > Reporter: Zheng Shao > > In Hive we not only supports SQL but also want to support custom scripts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.