[ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567856#comment-13567856 ]
Prajakta Kalmegh commented on HIVE-896: --------------------------------------- This is not exactly a bug. In the existing trunk, the ExtractOperator is followed by a FileSinkOperator and hence does not have this problem. For queries like below: select p1.p_mfgr, p1.p_name, p1.p_size from part p1 join part p2 on p1.p_partkey = p2.p_partkey distribute by p1.p_mfgr sort by p1.p_name; a SelectOperator after JoinOperator solves this problem by filtering the virtual columns (VCs) and setting up a correct RR for ReduceSinkOperator. We cannot insert a SelectOperator in our case as the PTF chain is a black-box for us. In queries with the PTFOperator, we use the RowResolver of the ExtractOperator to construct ExprNodeDescs during translation. The problem here is: if we do not filter out the VCs from the ExtractOperator and use them during translation, the ColumnPrunerTableScanProc adds these VCs in the newVirtualCols List. This causes a non-empty virtualCols on TableScanDesc. During runtime, in the MapOperator the 'hasVC' boolean is set to true eventually resulting in a ClassCastException in ReduceSinkOperator during row evaluation. This problem occurs particularly for queries involving join with PTF (We can walk through some examples offline to explain why this is not a problem for queries with a PTF and no join). So currently, we are filtering the VCs and setting up a new RowResolver for ExtractOperator during translation so that the columns at runtime match with those during translation. > Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. > --------------------------------------------------------------- > > Key: HIVE-896 > URL: https://issues.apache.org/jira/browse/HIVE-896 > Project: Hive > Issue Type: New Feature > Components: OLAP, UDF > Reporter: Amr Awadallah > Priority: Minor > Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, > Hive-896.2.patch.txt > > > Windowing functions are very useful for click stream processing and similar > time-series/sliding-window analytics. > More details at: > http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709 > http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059 > http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032 > -- amr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira