AnzhiZhang opened a new pull request, #3657: URL: https://github.com/apache/texera/pull/3657
There is an example Python UDF generator in the [wiki](https://github.com/apache/texera/wiki/Guide-to-Use-a-Python-UDF#1-out-udf). https://github.com/apache/texera/blob/12849accf7a1734ba0fd7feeabbf4df9e0bff812/core/amber/src/main/python/pytexera/udf/examples/generator_operator.py#L21-L24 However, this will raise an error `TypeError: Unmatched type for field 'test', expected AttributeType.INT, got [1, 2, 3] (<class 'list'>) instead.` ``` 2025-08-12 22:15:42.264 | ERROR | core.runnables.data_processor:process_internal_marker:83 - Unmatched type for field 'test', expected AttributeType.INT, got [1, 2, 3] (<class 'list'>) instead. Traceback (most recent call last): File "/opt/anaconda3/envs/texera/lib/python3.10/threading.py", line 973, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x1021c6a70> └ <Thread(data_processor_thread, started daemon 13440872448)> File "/opt/anaconda3/envs/texera/lib/python3.10/threading.py", line 1016, in _bootstrap_inner self.run() │ └ <function Thread.run at 0x1021c67a0> └ <Thread(data_processor_thread, started daemon 13440872448)> File "/opt/anaconda3/envs/texera/lib/python3.10/threading.py", line 953, in run self._target(*self._args, **self._kwargs) │ │ │ │ │ └ {} │ │ │ │ └ <Thread(data_processor_thread, started daemon 13440872448)> │ │ │ └ () │ │ └ <Thread(data_processor_thread, started daemon 13440872448)> │ └ <bound method DataProcessor.run of <core.runnables.data_processor.DataProcessor object at 0x16b1f2500>> └ <Thread(data_processor_thread, started daemon 13440872448)> File "/Users/andy/Documents/Projects/texera/core/amber/src/main/python/core/runnables/data_processor.py", line 58, in run self.process_internal_marker(marker) │ │ └ <core.models.internal_marker.EndChannel object at 0x16da2cd60> │ └ <function DataProcessor.process_internal_marker at 0x16b18cb80> └ <core.runnables.data_processor.DataProcessor object at 0x16b1f2500> > File "/Users/andy/Documents/Projects/texera/core/amber/src/main/python/core/runnables/data_processor.py", line 80, in process_internal_marker self._set_output_tuple(executor.on_finish(port_id)) │ │ │ │ └ 0 │ │ │ └ <function SourceOperator.on_finish at 0x1679b8c10> │ │ └ <udf-v1.GeneratorOperator object at 0x16b1f3f40> │ └ <function DataProcessor._set_output_tuple at 0x16b18cd30> └ <core.runnables.data_processor.DataProcessor object at 0x16b1f2500> File "/Users/andy/Documents/Projects/texera/core/amber/src/main/python/core/runnables/data_processor.py", line 147, in _set_output_tuple output_tuple.finalize( │ └ <function Tuple.finalize at 0x1679ae680> └ Tuple['test': [1, 2, 3]] File "/Users/andy/Documents/Projects/texera/core/amber/src/main/python/core/models/tuple.py", line 263, in finalize self.validate_schema(schema) │ │ └ <core.models.schema.schema.Schema object at 0x16d3925f0> │ └ <function Tuple.validate_schema at 0x1679ae7a0> └ Tuple['test': [1, 2, 3]] File "/Users/andy/Documents/Projects/texera/core/amber/src/main/python/core/models/tuple.py", line 329, in validate_schema raise TypeError( TypeError: Unmatched type for field 'test', expected AttributeType.INT, got [1, 2, 3] (<class 'list'>) instead. ``` This example should be written as ``` class GeneratorOperator(UDFSourceOperator): @overrides def produce(self) -> Iterator[Union[TupleLike, TableLike, None]]: for i in [1, 2, 3]: yield {"test": i} ``` <img width="3054" height="1690" alt="image" src="https://github.com/user-attachments/assets/7bcf37f0-1ca4-4ddb-89a5-7c5cb2594eb6" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
