[jira] Commented: (HIVE-655) Add support for user defined table generating functions

Ning Zhang (JIRA) Fri, 13 Nov 2009 09:26:17 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777569#action_12777569
 ]


Ning Zhang commented on HIVE-655:
---------------------------------

If I understand it correctly, the semantics of "select pageid, 
explode(addid_list) from T" is to for each row in T, we will take a row in T 
and combine pageid and every item in addid_list to make it a set of rows. This 
can be easily translated to something like:

 SELECT T.pageid, S.addid FROM T, explode(addid_list) S; 

The benefits of this is that you can add WHERE, GROUP BY etc. to the query such 
as

  SELECT s.addid, t.pageid FROM t, explod(addid_list) S WHERE pageid < 10 GROUP 
BY S.addid, T.pageid;

If we put explode in the SELECT clause, we have to use a subquery to wrap up 
the explode() and put it into the FROM clause of the outer query. And we have 
to make sure  the inner subquery has no group by, no aggregation function etc., 
and we have to explain it to the users.  That seems unnecessarily complicated. 

If we want to follow the SQL's data flow: rows are passed in the order of FROM 
--> WHERE --> GROUP BY --> HAVING --> SELECT, I think it's better to put it 
into the FROM clause. This is not only the syntax of Oracle's table function, 
DB2 also need to reference the user defined table function in the FROM clause 
(http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/ad/t0009218.htm).
 This is also in line with the SQL's LATERAL VIEW construct where views defined 
in the FROM clause can reference tables appearing before them in the same FROM 
clause. It has a well-define semantics and it can be easily composed with other 
SQL construct without any exceptions. 

But again, if we want backward compatibility I have no objection of putting it 
into the SELECT clause, but I think we'd better also support UDTFs in the FROM 
clause as well.

> Add support for user defined table generating functions
> -------------------------------------------------------
>
>                 Key: HIVE-655
>                 URL: https://issues.apache.org/jira/browse/HIVE-655
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Raghotham Murthy
>            Assignee: Paul Yang
>         Attachments: HIVE-655.1.patch, HIVE-655.2.patch
>
>
> Provide a way for users to add a table generating function, i.e., functions 
> that generate multiple rows from a single input row. Currently, the only way 
> to do it is via the TRANSFORM clause which requires streaming the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-655) Add support for user defined table generating functions

Reply via email to