Xiao-zhen-Liu commented on issue #4074:
URL: https://github.com/apache/texera/issues/4074#issuecomment-3572160296

   @hhvu0102 Thank you for the issue. You are trying to use some advanced 
features of the UDF. What you described all look like intended behavior to me.
   
   For the first "print" problem:
   
   `process_table(self, table: Table, port: int)` is `invoked once for each 
port`, and the `port` parameter is a single value for each invocation. In your 
case, the `process_table` for port 1 is invoked first, and since you code is 
printing it twice in the `process_table` function, it will be printed twice. 
The behavior of your code is:
   
   - port 1's logic, which is processed first:
   `df0` = `port 1`'s table
   `print port 1`
   `df1` = `port 1`'s table
   `print port 1`
   - port 1's logic, which is processed first:
   `df0` = `port 0`'s table
   `print port 0`
   `df1` = `port 0`'s table
   `print port 0`
   
   The correct way to write these kind of logic should be (similar to what we 
mentioned in the 
[wiki]([url](https://github.com/apache/texera/wiki/Guide-to-Use-a-Python-UDF#2-in-udf))):
   
   ```
   @overrides
       def process_table(self, table: Table, port: int) -> 
Iterator[Optional[TableLike]]:
              if port == 0:
                 # port 0's processing logic
              elif port == 1:
                 # port 1's processing logic
   ```
   
   For the second problem about input order:
   
   You can specify the order of the two inputs by specifying a dependency from 
port 1 to port 0 (see the below gif). This will tell the engine to process port 
0 before port 1.
   
   
![Image](https://github.com/user-attachments/assets/38c186bd-9760-4ccc-b4b4-f7d551f98338)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to