InvisibleProgrammer commented on code in PR #6043:
URL: https://github.com/apache/hive/pull/6043#discussion_r2322305573


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/OpProcFactory.java:
##########
@@ -676,6 +683,92 @@ public Object process(Node nd, Stack<Node> stack, 
NodeProcessorCtx procCtx,
     }
   }
 
+  /**
+   * PTF processor
+   */
+  public static class PTFLineage implements SemanticNodeProcessor {
+
+    @Override
+    public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx 
procCtx, Object... nodeOutputs) throws SemanticException {
+      // LineageCTx
+      LineageCtx lCtx = (LineageCtx) procCtx;
+
+      // The operators
+      @SuppressWarnings("unchecked")
+      PTFOperator op = (PTFOperator)nd;
+      Operator<? extends OperatorDesc> inpOp = getParent(stack);
+      lCtx.getIndex().copyPredicates(inpOp, op);
+
+      Dependency dep = new Dependency();
+      DependencyType new_type = DependencyType.EXPRESSION;
+      dep.setType(new_type);
+      // TODO: Fix this to a non-null value. This comment comes from the 
default implementation (TransformLineage)
+      dep.setExpr(null);
+
+      List<String> columns = new ArrayList<>();
+      PartitionedTableFunctionDef funcDef = op.getConf().getFuncDef();
+
+      if (funcDef.getPartition() != null) {
+        addAllMakeUniqueIfNotNull(columns, 
funcDef.getPartition().getExpressions().getFirst().getExprNode().getCols());
+      }
+      if (funcDef.getOrder() != null) {
+        addAllMakeUniqueIfNotNull(columns, 
funcDef.getOrder().getExpressions().getFirst().getExprNode().getCols());
+      }
+
+      if (!(funcDef.getTFunction() instanceof Noop)) {
+
+          if (funcDef instanceof WindowTableFunctionDef
+                  && ((WindowTableFunctionDef) 
funcDef).getWindowFunctions().getFirst().getArgs() != null) {

Review Comment:
   No, if you see the first example in `lineage.ptf.q`: 
   ```sql
   create view b_v_4_0 as
   select *
   from (select col_001,
       row_number() over (partition by src.p1) as r_num,
       row_number() over (partition by src.col_002) as r_num2,
       row_number() over (partition by src.col_002) as r_num3
   
           from source_tbl2 src) v1;
   ```
   The output is: 
   ```
   "edges":[
         {
            "sources":[
               4
            ],
            "targets":[
               0
            ],
            "edgeType":"PROJECTION"
         },
         {
            "sources":[
               5
            ],
            "targets":[
               1
            ],
            "expression":"(tok_function row_number (tok_windowspec 
(tok_partitioningspec (tok_distributeby (. (tok_table_or_col src) p1)) 
(tok_orderby (tok_tabsortcolnameasc (tok_nulls_first (. (tok_table_or_col src) 
p1))))) (tok_windowrange (preceding unbounded) (following unbounded))))",
            "edgeType":"PROJECTION"
         },
         {
            "sources":[
               6
            ],
            "targets":[
               2,
               3
            ],
            "expression":"(tok_function row_number (tok_windowspec 
(tok_partitioningspec (tok_distributeby (. (tok_table_or_col src) col_002)) 
(tok_orderby (tok_tabsortcolnameasc (tok_nulls_first (. (tok_table_or_col src) 
col_002))))) (tok_windowrange (preceding unbounded) (following unbounded))))",
            "edgeType":"PROJECTION"
         }
      ],
      "vertices":[
         {
            "id":0,
            "vertexType":"COLUMN",
            "vertexId":"default.b_v_4_0.col_001"
         },
         {
            "id":1,
            "vertexType":"COLUMN",
            "vertexId":"default.b_v_4_0.r_num"
         },
         {
            "id":2,
            "vertexType":"COLUMN",
            "vertexId":"default.b_v_4_0.r_num2"
         },
         {
            "id":3,
            "vertexType":"COLUMN",
            "vertexId":"default.b_v_4_0.r_num3"
         },
         {
            "id":4,
            "vertexType":"COLUMN",
            "vertexId":"default.source_tbl2.col_001"
         },
         {
            "id":5,
            "vertexType":"COLUMN",
            "vertexId":"default.source_tbl2.p1"
         },
         {
            "id":6,
            "vertexType":"COLUMN",
            "vertexId":"default.source_tbl2.col_002"
         }
      ]
   ```
   
   The reason why it works in this way is because that process method is called 
on each node that is a PTF function. So if we add 3 PTF expressions into a 
select, we will get 3 calls. 
   On the other side, getFirst is enough as there can be only one windowing 
function in the PTF expression. The same is true for partitions and order by. 
Taker order by as a simplest example: a simple order by can contain multiple 
columns. But cannot have multiple order by statements in a single select. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to