zendesk-kjaanson commented on issue #32461:
URL: https://github.com/apache/beam/issues/32461#issuecomment-2615291417

   I am adding my experience trying to use PubsubIO with FlinkRunner and Python 
for the past few months.
   
   - Getting PubsubIO working via expansion transform requires adding jobserver 
jar manually to the expansion service classpath, otherwise it is not registered 
and trying to use this will give an error regarding transform URN not found. 
Quick peek into the underlying code shoed that it is not using 
`ExternalTransform` base class as KafkaIO is using? Not sure whats happening 
there.
   - There is some kind of bug when trying to send `PubsubMessage` with 
attributes. Can't remember what was the issue since I simply wanted to get the 
thing working and did not need attribute sending actually.
   - After getting PubsubIO seemingly functional in Python FlinkRunner 
pipeline, then small amounts of data (1-40 messages per min) go thrgouh, but if 
there is any volume that starts to resemble possible production volume the 
pipeline will consume messages until it hits checkpoint interval limit and then 
fail processing and start consuming messages from the start. Some of the 
processed messages will get sent to sink, repeatedly. I played around with 
different combinations of checkpointing intervals and pubsub ack deadlines but 
nothing really helped.
   
   In the end I switched to KafkaIO and that works nicely (when using the 
`use_deprecated_read` mode).
   
   For now I don't think PubsubIO is in any way usable with FlinkRunner.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to