Xiao-zhen-Liu commented on issue #4074: URL: https://github.com/apache/texera/issues/4074#issuecomment-3572160296
@hhvu0102 Thank you for the issue. You are trying to use some advanced features of the UDF. What you described all look like intended behavior to me. For the first "print" problem: `process_table(self, table: Table, port: int)` is `invoked once for each port`, and the `port` parameter is a single value for each invocation. In your case, the `process_table` for port 1 is invoked first, and since you code is printing it twice in the `process_table` function, it will be printed twice. The behavior of your code is: - port 1's logic, which is processed first: `df0` = `port 1`'s table `print port 1` `df1` = `port 1`'s table `print port 1` - port 1's logic, which is processed first: `df0` = `port 0`'s table `print port 0` `df1` = `port 0`'s table `print port 0` The correct way to write these kind of logic should be (similar to what we mentioned in the [wiki]([url](https://github.com/apache/texera/wiki/Guide-to-Use-a-Python-UDF#2-in-udf))): ``` @overrides def process_table(self, table: Table, port: int) -> Iterator[Optional[TableLike]]: if port == 0: # port 0's processing logic elif port == 1: # port 1's processing logic ``` For the second problem about input order: You can specify the order of the two inputs by specifying a dependency from port 1 to port 0 (see the below gif). This will tell the engine to process port 0 before port 1.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
