zendesk-kjaanson commented on issue #32461: URL: https://github.com/apache/beam/issues/32461#issuecomment-2615291417
I am adding my experience trying to use PubsubIO with FlinkRunner and Python for the past few months. - Getting PubsubIO working via expansion transform requires adding jobserver jar manually to the expansion service classpath, otherwise it is not registered and trying to use this will give an error regarding transform URN not found. Quick peek into the underlying code shoed that it is not using `ExternalTransform` base class as KafkaIO is using? Not sure whats happening there. - There is some kind of bug when trying to send `PubsubMessage` with attributes. Can't remember what was the issue since I simply wanted to get the thing working and did not need attribute sending actually. - After getting PubsubIO seemingly functional in Python FlinkRunner pipeline, then small amounts of data (1-40 messages per min) go thrgouh, but if there is any volume that starts to resemble possible production volume the pipeline will consume messages until it hits checkpoint interval limit and then fail processing and start consuming messages from the start. Some of the processed messages will get sent to sink, repeatedly. I played around with different combinations of checkpointing intervals and pubsub ack deadlines but nothing really helped. In the end I switched to KafkaIO and that works nicely (when using the `use_deprecated_read` mode). For now I don't think PubsubIO is in any way usable with FlinkRunner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
