[
https://issues.apache.org/jira/browse/HIVE-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777569#action_12777569
]
Ning Zhang commented on HIVE-655:
---------------------------------
If I understand it correctly, the semantics of "select pageid,
explode(addid_list) from T" is to for each row in T, we will take a row in T
and combine pageid and every item in addid_list to make it a set of rows. This
can be easily translated to something like:
SELECT T.pageid, S.addid FROM T, explode(addid_list) S;
The benefits of this is that you can add WHERE, GROUP BY etc. to the query such
as
SELECT s.addid, t.pageid FROM t, explod(addid_list) S WHERE pageid < 10 GROUP
BY S.addid, T.pageid;
If we put explode in the SELECT clause, we have to use a subquery to wrap up
the explode() and put it into the FROM clause of the outer query. And we have
to make sure the inner subquery has no group by, no aggregation function etc.,
and we have to explain it to the users. That seems unnecessarily complicated.
If we want to follow the SQL's data flow: rows are passed in the order of FROM
--> WHERE --> GROUP BY --> HAVING --> SELECT, I think it's better to put it
into the FROM clause. This is not only the syntax of Oracle's table function,
DB2 also need to reference the user defined table function in the FROM clause
(http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2.udb.doc/ad/t0009218.htm).
This is also in line with the SQL's LATERAL VIEW construct where views defined
in the FROM clause can reference tables appearing before them in the same FROM
clause. It has a well-define semantics and it can be easily composed with other
SQL construct without any exceptions.
But again, if we want backward compatibility I have no objection of putting it
into the SELECT clause, but I think we'd better also support UDTFs in the FROM
clause as well.
> Add support for user defined table generating functions
> -------------------------------------------------------
>
> Key: HIVE-655
> URL: https://issues.apache.org/jira/browse/HIVE-655
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Raghotham Murthy
> Assignee: Paul Yang
> Attachments: HIVE-655.1.patch, HIVE-655.2.patch
>
>
> Provide a way for users to add a table generating function, i.e., functions
> that generate multiple rows from a single input row. Currently, the only way
> to do it is via the TRANSFORM clause which requires streaming the data.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.