hhvu0102 opened a new issue, #4074:
URL: https://github.com/apache/texera/issues/4074
### What happened?
When there are multiple ports, each port is called twice. For example, using
this code:
```
class ProcessTableOperator(UDFTableOperator):
@overrides
def process_table(self, table: Table, port: int) ->
Iterator[Optional[TableLike]]:
df0 = pd.DataFrame(table)
print(f'port is {port}')
df1 = pd.DataFrame(table)
print(f'port is {port}')
yield None
```
The `f'port is {port}` is printed twice
<img width="774" height="311" alt="Image"
src="https://github.com/user-attachments/assets/2915907d-6a3b-4e12-a7fc-edd7d0fc1598"
/>
This signals to me that the function is called as many times as the number
of ports, which should not be a normal behavior in my opinion.
Additionally, `port 1` is called before `port 0` because `port 1` data is
smaller and is , which Chris and Jiadong mentioned is what's intended. However,
if an user is unaware, they may write a code to read in the data from `port 0`
first because that's the intended order, which will be an error as data is not
loaded. Currently I write my own wrapper so inputs from each port can be
recognized correctly, but do you have any example of how you often handle this
case? I can make a new issue for this question if that's easier.
### How to reproduce?
Dataset: `samn16081314-downsampled-5k`; workflow:
`PanKbase_example_5k_multi-port` - I have shared these via email to Meng,
Jiadong, Chris and Chen. Please let me know if I should share them with any
other team members!
To run this workflow, you'll need to install some packages yourself. In
Python:
```
pip install pysam
pip install cellbender
pip install scikit-image
```
In R:
```
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DropletUtils")
```
### Version
1.1.0-incubating (Pre-release/Master)
### Commit Hash (Optional)
_No response_
### What browsers are you seeing the problem on?
_No response_
### Relevant log output
```shell
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]