InvisibleProgrammer commented on code in PR #6043: URL: https://github.com/apache/hive/pull/6043#discussion_r2322305573
########## ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/OpProcFactory.java: ########## @@ -676,6 +683,92 @@ public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, } } + /** + * PTF processor + */ + public static class PTFLineage implements SemanticNodeProcessor { + + @Override + public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, Object... nodeOutputs) throws SemanticException { + // LineageCTx + LineageCtx lCtx = (LineageCtx) procCtx; + + // The operators + @SuppressWarnings("unchecked") + PTFOperator op = (PTFOperator)nd; + Operator<? extends OperatorDesc> inpOp = getParent(stack); + lCtx.getIndex().copyPredicates(inpOp, op); + + Dependency dep = new Dependency(); + DependencyType new_type = DependencyType.EXPRESSION; + dep.setType(new_type); + // TODO: Fix this to a non-null value. This comment comes from the default implementation (TransformLineage) + dep.setExpr(null); + + List<String> columns = new ArrayList<>(); + PartitionedTableFunctionDef funcDef = op.getConf().getFuncDef(); + + if (funcDef.getPartition() != null) { + addAllMakeUniqueIfNotNull(columns, funcDef.getPartition().getExpressions().getFirst().getExprNode().getCols()); + } + if (funcDef.getOrder() != null) { + addAllMakeUniqueIfNotNull(columns, funcDef.getOrder().getExpressions().getFirst().getExprNode().getCols()); + } + + if (!(funcDef.getTFunction() instanceof Noop)) { + + if (funcDef instanceof WindowTableFunctionDef + && ((WindowTableFunctionDef) funcDef).getWindowFunctions().getFirst().getArgs() != null) { Review Comment: No, if you see the first example in `lineage.ptf.q`: ```sql create view b_v_4_0 as select * from (select col_001, row_number() over (partition by src.p1) as r_num, row_number() over (partition by src.col_002) as r_num2, row_number() over (partition by src.col_002) as r_num3 from source_tbl2 src) v1; ``` The output is: ``` "edges":[ { "sources":[ 4 ], "targets":[ 0 ], "edgeType":"PROJECTION" }, { "sources":[ 5 ], "targets":[ 1 ], "expression":"(tok_function row_number (tok_windowspec (tok_partitioningspec (tok_distributeby (. (tok_table_or_col src) p1)) (tok_orderby (tok_tabsortcolnameasc (tok_nulls_first (. (tok_table_or_col src) p1))))) (tok_windowrange (preceding unbounded) (following unbounded))))", "edgeType":"PROJECTION" }, { "sources":[ 6 ], "targets":[ 2, 3 ], "expression":"(tok_function row_number (tok_windowspec (tok_partitioningspec (tok_distributeby (. (tok_table_or_col src) col_002)) (tok_orderby (tok_tabsortcolnameasc (tok_nulls_first (. (tok_table_or_col src) col_002))))) (tok_windowrange (preceding unbounded) (following unbounded))))", "edgeType":"PROJECTION" } ], "vertices":[ { "id":0, "vertexType":"COLUMN", "vertexId":"default.b_v_4_0.col_001" }, { "id":1, "vertexType":"COLUMN", "vertexId":"default.b_v_4_0.r_num" }, { "id":2, "vertexType":"COLUMN", "vertexId":"default.b_v_4_0.r_num2" }, { "id":3, "vertexType":"COLUMN", "vertexId":"default.b_v_4_0.r_num3" }, { "id":4, "vertexType":"COLUMN", "vertexId":"default.source_tbl2.col_001" }, { "id":5, "vertexType":"COLUMN", "vertexId":"default.source_tbl2.p1" }, { "id":6, "vertexType":"COLUMN", "vertexId":"default.source_tbl2.col_002" } ] ``` The reason why it works in this way is because that process method is called on each node that is a PTF function. So if we add 3 PTF expressions into a select, we will get 3 calls. On the other side, getFirst is enough as there can be only one windowing function in the PTF expression. The same is true for partitions and order by. Taker order by as a simplest example: a simple order by can contain multiple columns. But cannot have multiple order by statements in a single select. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org