[ 
https://issues.apache.org/jira/browse/HIVE-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777702#action_12777702
 ] 

Paul Yang commented on HIVE-655:
--------------------------------

So I had a discussion with Ning and Namit this morning and a slightly different 
syntax for UDTF's was proposed. Something like:

{code}
SELECT pageid, adid FROM myTable LATERAL VIEW explode(adid_list) AS adid ;
{code}

where the LATERAL VIEW keyword associates the given UDTF with the table in the 
FROM clause. As Ning pointed out, one of the issues with having the UDTF in the 
SELECT is that queries like the following

{code}
SELECT pageid, explode(adid_list), count(1) FROM myTable GROUP BY pageid;
{code}

are a bit confusing as it's not clear what it's supposed to do. We could 
disallow these sort of operations but it makes it more complicated to the user. 
Using LATERAL VIEW also handles Raghotham's concern about having to specify the 
input for the UDTF. The UDTF still returns one column, thought multiple values 
can be returned via a an array or a struct. 

Zheng, do you have any thoughts about the proposed syntax? I know from early on 
UDTF's were planned to be in the SELECT clause and I'm wondering if there were 
other reasons for why UDTF's should be there. With SELECT, it seemed more 
straightforward implementation-wise. Also, going back to TRANSFORM, it does 
seem like it can fit in FROM too. What was the rationale for having it in the 
SELECT?

> Add support for user defined table generating functions
> -------------------------------------------------------
>
>                 Key: HIVE-655
>                 URL: https://issues.apache.org/jira/browse/HIVE-655
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Raghotham Murthy
>            Assignee: Paul Yang
>         Attachments: HIVE-655.1.patch, HIVE-655.2.patch
>
>
> Provide a way for users to add a table generating function, i.e., functions 
> that generate multiple rows from a single input row. Currently, the only way 
> to do it is via the TRANSFORM clause which requires streaming the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to