[ https://issues.apache.org/jira/browse/HIVE-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-4126: ------------------------------ Attachment: HIVE-4126.D9105.2.patch hbutani updated the revision "HIVE-4126 [jira] remove support for lead/lag UDFs outside of UDAF args". - add PTF clause to grammar - PTF Test Queries - Query Data - corrected data file name - Merge branch 'hive-896' of github.com:hbutani/hive into hive-896 - windowing + hive attempt - Hooking QueryDef to QB - ptf source is a subQuery, not a select statement - add windowing clauses in grammar - fix grammar exception issue - associate PTF nodes with corresponding Insert node, if any - minor grammar fixes - flush out processing PTF tree in phase1 - associate PTFs in dest node handling in Phase1 - Classes not needed - Merge branch 'crook' of github.com:hbutani/hive into crook - Merge with apache hive - handle SortBy, Having and Window clauses in Phase1 - remove ambiguities in Hive.g - Merge branch 'crook' of https://github.com/hbutani/hive into crook - syntactically allow a window specification in a selectItem - tweak QuerySpec building. - check that there is no GBy and where when deciding if a Windowing - Merge branch 'crook' of https://github.com/hbutani/hive into crook - Add several checks: - QSpec to QDef (Checked qdef serialization and deserialization: it works) - refactor the PTF ifc. - refactor: rename annotation classes to end in Description - refactor: move annotation classes to ql.exec package, where the - refactor: move window functions to ql.udf.generic package - refcator: move GenericUDFLeadLag to ql.udf.generic - refactor: move TablFunc bases classes to ql.udf.ptf - refactor: move PTblFuncs to ql.udf.ptf - refactor: move Order enum to ptf.query.spec package, so that i can - refactor: remove classes in ptf.metadata package. Not needed now - refactor: FunctionRegistry, extract FunctionInfo classes; step 1 in - fix logic that checks for windowing specifications in Select List - reenable ensurePTFChainHasPartitioning; - Added aliasToAST map: To setup expressions map in PTF's output - Added AST expression: Used to populate PTF RowResolver's expression map - Using input operator's RowResolver to construct OI for HiveTableDef - Use of SCRIPT DependencyType for processing rule on PTFOperator. - Cleanup: Changed prototype of translate method in Translator. Commented - Add utility methods: - Add method to get operator name in PTFOperator. - 1. Translation of QuerySpec to QueryDef - Merge branch 'hive-896' into crook - Following changes: - refactor genPTFPlan: - minor bug: clear Agg. and Distinct Agg. lists in QB ParseInfo. - flush out SelectDef translation: - introduce initializeOutputOI and initializeRawInputOI to TableFunction - When constructing the RowResolver for the Windowing or Noop PTFs: - add the columns from the last PTF, before adding any - 1. Change logic of how/which TableFunc is added to a QuerySpec: if query - During QDef deserialization use the passed in inputOI as the OI of the - for subQueries as input to PTF, construct a HiveTableSpec. - Tests successful for queries with: windowing, lead/lag, noop, gby, having, join with lead/lag, join with noop - support an alias for a PTF invocation. This is needed so that a PTF - add test for alias in ptf invocation - translate ptf invocations in the from clause (that are not associated - add tests that exercise generation of separate PTFOps for PTFChain and - Fix aggregations bug: move aggregation expressions from aggregationTrees to PTF QuerySpec if no group by clause is seen at the end of phase 1 - add support for PTF invocation in joins. - adding tests for ptf invocation in joins - handle mixed case aliases. - mixed case alias test - Create PTF Map-side RR: - Merge branch 'hive-896' into crook - fix Hive.g merge issue: duplicate KW_ROWS definition - Having: Tests to support having with windowing and ptf in queries with no group by. - - during handleClusterOrDistributeByForWindowing invoke the - Merge branch 'crook' of https://github.com/hbutani/hive into crook - add tests to check - when extracting Windowing clauses from selectList handle the case - when extracting Windowing clauses from selectList handle the case - More tests with UDAFs, statistical and distribution functions - No need to specify Writable option to copy object. - Following changes: - Tests: - disallow Count/Sum distinct with windowing - refactor ptf.translate package: - Merge branch 'ptf' of https://github.com/hbutani/hive into ptf - refactor ptf.query.specification package - Merge branch 'ptf' of https://github.com/hbutani/hive into ptf - refactor ptf.query.definition package - Get rid of ql.ptf.functions package - move PTFSpec to hive.ql.parse package - move PTFDef to hive.ql.plan package - rename QueryInput Def & Spec class names to better reflect their - refactor ptf.runtime package - Setup PTFSpec for QB at the end of phase I for cases where it is not already done. - Rollback change for os.family - Remove individual test files: all test queries are in - get rid ptf.io package - minor cleanup in ptf.ds package - Merge branch 'ptf' of https://github.com/hbutani/hive into ptf - remove ptf.utils.HiveUtils - cleanup of ptf.query.translate package - merge with hive - cleanup and document data struct additions on SemAly, QB, QBParseInfo. - Changed error message for negative test: ptf_negative_NoSortNoDistByClause - Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf - Remove NPath code - Merge with PTF HEAD - Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf - Allow use of constant expressions in select clause for queries with No GBY, No PTF and No windowing - cleanup: get rid of WindowingTypeCheckFactory; use hive's - check for errors from TypeCheckFactory when building - allow Windowing invocations w/o aliases - cleanup: move remaining ParseUtils function to PTFTranslator - cleanup: - cleanup: move PTFPartition to ql.exec package - carry forward the expression mappings in the RRs constructed for PTFs - use column position to generate internal names for PTF Op's RR. - Resolve merge conflicts - Normalize line endings - Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf - recover changes to PTFOp lost because of merge and CRLF issues - support different UDAF invocations on the same UDAF but different - fix range based scanner and add range based window tests - - set the first Windowing clause encountered in a UDAF invocation as the - Merge branch 'ptf' of https://github.com/hbutani/hive into ptf - Account for empty partitions while closing the PTFOperator - support different literal types in constant expressions in select list - fix merge issues due to CRLF - add tests for Partition & Order specs specified - rules for inferring the default Partitioning Spec. - To avoid losing changes due to CRLF issue - Do not process queries with constants in select and let Hive handle them - redo check if a SelectList constitutes a valid GBy query. - Resolve merge conflicts with apache_hive - Compare expressions trees using toStringTree() in translateOrder - for distinct queries filter out exprs handled by windowing - Merge branch 'ptf' of https://github.com/hbutani/hive into ptf - when validating a SelectList for GBy, account for FUNCTION_DI tokens - add test for select distinct + windowing - get rid of WindowingException; - cleanup ptf.utils.Utils and ptf.query.SerializationUtils - Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf - fix merge issue: we already had the methods in ASTNode for antlr3.4 - clean ptf.ds package; move code to PTFPersistence class - Fix negative tests with new messages (WindowingException removed) - add a way to disambiguate between sort expressions and fn. args - expose the PTFInfo via the PTFDef; so the RR can be used for - change ptf invocation so args come before table and partitioning spec. - get rid of ptf.Constants class; add new ConfVars - Support multi-operator function chain + tests - fix query componentization logic. - initial checkin of reviving NPath. - Add more tests for query componentization - undo inadvertant change made to .gitignore - change to PTF ifc. A TableFunc is now also responsible for the names of - finish NPath - Merge branch 'ptf' of https://github.com/hbutani/hive into ptf - Cleanup: remove unused methods in PTFTranslator, remove equals methods - merge with hive - fix merge issue in SemanticAnalyzer - Fix bug: Allow partitioning spec for functions that do not support windowing - remove dependency on SemanticAnalyzer in PTFTranslator. - add OperatorType and support PTFOperator for explain plan - Merge branch 'ptf' of github.com:hbutani/hive into ptf - Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf - Merge remote-tracking branch 'remotes/apache_hive/trunk' into ptf - Resolve Merge conflicts with apache_hive/ptf_windowing - refactored spec classes - refactored PTFDesc; for now called PTFDesc2 - refactored translation of PTF Chain. - refactor Windowing translation - simplify RowResolver creation. - refactored ptfDesc deserializer - refactor SemanticAnalyzer: - refactor PTFOp and function classes to use new data structs. - When executing the WdwTblFunc: - the first Arg of a Lead/Lag function can refer to windowingFns, - construct PTF RR using alias specified with PTF invocation - semAly windowingSpec was not being set on the current QB - For windowing set the OI for output from Wdw Processing to exactly as - when setting default PartSpec in WdwTabFn don't clear default Order - add Windowing Exprs to qbp.destToWindowingExprs; used to filter out - setHasWindowing not set in moveaggregationExprsToWindowingSpec - commit with hive - rename ranking functions in PTFTrans2 - changes to ptf_general_queries.q - fix sort by in queries 62, 63 - remove suffix 2 from new classes - add apache license comment to new classes. - Change get/settFunction to get/setTFunction in PTFDesc - Fix range condition in PTFPartition: without this change the tests do not run from CLI - Modify rc_file and seq_file tests to specify dist/sort condition with windowing - fix Deserilizer bugs - fix npath deserialization - Merge remote-tracking branch 'remotes/hive/ptf-windowing' into ptf - remove dependency on FuncRegistry in PTFDeserializer for PTFs - Add fix for constructing Extract Operator RR during windowing plan generation + Add test 50 to ptf_general_queries.q - allow wdw fn refernces in wdw expressions - rewrite .q and .out file - add apache license - apache license headers - rebuild RR for Noop/NoopMap even when no there is no alias. Input's - add comments to PTFOp, WdwFuncDesc - fix lint issues - Merge remote-tracking branch 'remotes/apache_hive/ptf-windowing' into ptf - Fix negative tests - Merge remote-tracking branch 'remotes/apache_hive/ptf-windowing' into ptf - Column Pruner support for PTFOperator: For HIVE-4035 - minor changes and document PTF ColumnPruner - Merge remote-tracking branch 'remotes/apache_hive/ptf-windowing' into ptf - Remove setting the hive.ptf.partition.persistence.memsize in ptf_general_queries.q - Merge branch 'ptf' of github.com:hbutani/hive into ptf - merge with hive - Resolve merge conflicts with apache_hive/ptf-windowing - Add new tests on Lead to ptf_general_queries.q + fix names - Resolve Merge conflicts with ptf-windowing + Renumber last 3 test cases - Resolve newline issues - Merge remote-tracking branch 'remotes/hive/ptf-windowing' into ptf - Merge remote-tracking branch 'hive/ptf-windowing' into ptf - HIVE-4082: Refactor tests - Resolve merge issues after merge with ptf-windowing - Resolve merge conflicts with ptf-windowing - Merge remote-tracking branch 'apache_hive/ptf-windowing' into ptf - Merge remote-tracking branch 'apache_hive/ptf-windowing' into ptf - Merge remote-tracking branch 'remotes/apache_hive/ptf-windowing' into ptf - HIVE-4126 [jira] remove support for lead/lag UDFs outside of UDAF args Reviewers: JIRA, ashutoshc REVISION DETAIL https://reviews.facebook.net/D9105 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D9105?vs=29205&id=29415#toc AFFECTED FILES common/src/java/org/apache/hadoop/hive/conf/HiveConf.java data/files/flights_tiny.txt data/files/part.rc data/files/part.seq ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/test/queries/clientpositive/leadlag.q ql/src/test/queries/clientpositive/leadlag_queries.q ql/src/test/queries/clientpositive/ptf.q ql/src/test/queries/clientpositive/windowing.q ql/src/test/queries/clientpositive/windowing_expressions.q ql/src/test/results/clientpositive/leadlag.q.out ql/src/test/results/clientpositive/leadlag_queries.q.out ql/src/test/results/clientpositive/ptf.q.out ql/src/test/results/clientpositive/windowing.q.out ql/src/test/results/clientpositive/windowing_expressions.q.out To: JIRA, ashutoshc, hbutani > remove support for lead/lag UDFs outside of UDAF args > ----------------------------------------------------- > > Key: HIVE-4126 > URL: https://issues.apache.org/jira/browse/HIVE-4126 > Project: Hive > Issue Type: Bug > Components: PTF-Windowing > Reporter: Harish Butani > Assignee: Harish Butani > Attachments: HIVE-4126.D9105.1.patch, HIVE-4126.D9105.2.patch > > > Select Expressions such as > p_size - lead(p_size,1) > are currently handled as non aggregation expressions done after all over > clauses are evaluated. > Once we allow different partitions in a single select list(Jira 4041), these > become ambiguous. > - the equivalent way to do such things is either to use lead/lag UDAFs with > expressions ( support added with Jira 4081) > - stack windowing clauses with inline queries. select lead(r,1).. from > (select rank() as r....)... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira