[
https://issues.apache.org/jira/browse/HIVE-935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783927#action_12783927
]
Paul Yang commented on HIVE-935:
--------------------------------
Some notes from an offline discussion
1. To be similar to other vendors, the alias specified after the lateral view
might be better as a table alias instead of as a column alias. For reference,
take the query:
{code}
SELECT <expressions> FROM example_table LATERAL VIEW explode(adid_list) AS adid
{code}
Whether the alias 'adid' is for table or a column affects which <expressions>
are (in)valid. If adid were a table alias, then the following query would not
work:
{code}
SELECT adid ...
{code}
On the other hand If adid were a column, alias, the following query would not
work:
{code}
SELECT adid.* ...
{code}
In either case, the following query would be valid:
{code}
SELECT adid.col0 ...
{code}
2. Following the notion that the output of the UDTF is a table, the output of
explode() and any other GenericUDTF should be a struct.
3. The operator structure of the lateral view is as follows:
For a query such as
{code}
SELECT pageid, adid.* FROM example_table LATERAL VIEW explode(adid_list) AS adid
{code}
The top of the operator tree will look something like (excuse the ASCII):
{code}
*
[Table Scan]
/ \
[Select](*) [Select](adid_list)
| |
| [UDTF] (explode)
\ /
[Lateral View Join]
|
|
[Select] (pageid, adid.*)
|
....
{code}
Rows from the table scan operator will branch in two ways. The left branch will
contain a single select operator that gets all the columns from the table. The
right branch will only get the columns that should be sent to the following
UDTF operator. The output of both the left and right branches will get sent to
a lateral view join operator that appropriately joins the inputs.
An issue with this setup is that for every row coming from the left branch,
there may be one or more rows (that are the output of the UDTF) coming from the
right branch. The single row from the left branch and the multiple rows from
the right branch must be associated in some way. The proposed solution is to
depend on the positioning of the operators. If the operator DAG is setup as
previously mentioned, then the lateral view join operator should first see a
row from the left branch, followed by rows from the right branch, followed by a
single row from the left branch again, and so on. Hence, the reception of a row
from the left branch can be used as a delimiter.
For multiple lateral views, additional branches will be added from the table
scan operator that include the appropriate select and UDTF operators. A
recursive approach was considered, but may present issues with generating a
Cartesian product
4. With the proposed semantics and operator DAG, the definitions of
GenericUDTF. {process(), close()} must be adjusted slightly.
GenericUDTF.close() should not output additional rows and GenericUDTF.process()
should make all the calls to GenericUDTF.forward() that are necessary for the
given row.
> support LATERAL VIEW
> --------------------
>
> Key: HIVE-935
> URL: https://issues.apache.org/jira/browse/HIVE-935
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: Paul Yang
>
> For table functions, the following syntax should be supported
> select ... from T LATERAL VIEW explode(adid_list) as adid,...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.