[jira] Commented: (HADOOP-4569) Hive: new syntax for specifying custom map/reduce scripts

Zheng Shao (JIRA) Fri, 31 Oct 2008 16:24:37 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644461#action_12644461
 ]


Zheng Shao commented on HADOOP-4569:
------------------------------------

The old syntax for doing that was:

    FROM (
        FROM pv_users 
        SELECT TRANSFORM(pv_users.userid, pv_users.date)
        AS(key, value) 
        USING 'map_script' 
        CLUSTER BY key ) map_output 
    INSERT OVERWRITE TABLE pv_users_reduced
        SELECT TRANSFORM(map_output.key, map_output.value) 
        AS (date, count)
        USING 'reduce_script'; 

We plan to change that to:
    FROM (
        FROM pv_users 
        MAP pv_users.userid, pv_users.date
        USING 'map_script' 
        AS key, value
        CLUSTER BY key
        ) map_output 
    INSERT OVERWRITE TABLE pv_users_reduced
        REDUCE map_output.key, map_output.value
        USING 'reduce_script'
        AS date, count;


The script is expected to read tab-separated fields, and also generate 
tab-separated fields.


The major changes are:
•         Schemaless Mapper/Reducer: if there is "AS" we assume "AS key,value" 
which takes the bytes before the first tab into key, and the rest to value.
•         SELECT TRANSFORM changed to MAP/REDUCE to make it clear what is map 
and what is reduce.
•         Reordered USING and AS to make it clearer.
*         Support different shuffling/sorting keys by using "DISTRIBUTE BY" and 
"SORT BY" ("CLUSTER BY key" means "DISTRIBUTE BY key SORT BY key ASC")


> Hive: new syntax for specifying custom map/reduce scripts
> ---------------------------------------------------------
>
>                 Key: HADOOP-4569
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4569
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>
> In Hive we not only supports SQL but also want to support custom scripts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4569) Hive: new syntax for specifying custom map/reduce scripts

Reply via email to