[
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567856#comment-13567856
]
Prajakta Kalmegh commented on HIVE-896:
---------------------------------------
This is not exactly a bug. In the existing trunk, the ExtractOperator is
followed by a FileSinkOperator and hence does not have this problem. For
queries like below:
select p1.p_mfgr, p1.p_name,
p1.p_size
from part p1 join part p2 on p1.p_partkey = p2.p_partkey
distribute by p1.p_mfgr
sort by p1.p_name;
a SelectOperator after JoinOperator solves this problem by filtering the
virtual columns (VCs) and setting up a correct RR for ReduceSinkOperator. We
cannot insert a SelectOperator in our case as the PTF chain is a black-box for
us.
In queries with the PTFOperator, we use the RowResolver of the ExtractOperator
to construct ExprNodeDescs during translation. The problem here is: if we do
not filter out the VCs from the ExtractOperator and use them during
translation, the ColumnPrunerTableScanProc adds these VCs in the newVirtualCols
List. This causes a non-empty virtualCols on TableScanDesc. During runtime, in
the MapOperator the 'hasVC' boolean is set to true eventually resulting in a
ClassCastException in ReduceSinkOperator during row evaluation. This problem
occurs particularly for queries involving join with PTF (We can walk through
some examples offline to explain why this is not a problem for queries with a
PTF and no join). So currently, we are filtering the VCs and setting up a new
RowResolver for ExtractOperator during translation so that the columns at
runtime match with those during translation.
> Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
> ---------------------------------------------------------------
>
> Key: HIVE-896
> URL: https://issues.apache.org/jira/browse/HIVE-896
> Project: Hive
> Issue Type: New Feature
> Components: OLAP, UDF
> Reporter: Amr Awadallah
> Priority: Minor
> Attachments: DataStructs.pdf, HIVE-896.1.patch.txt,
> Hive-896.2.patch.txt
>
>
> Windowing functions are very useful for click stream processing and similar
> time-series/sliding-window analytics.
> More details at:
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
> -- amr
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira