alexmreis commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1416594045

   The implementation of Kafka in the Python SDK + Portable Runner is 
unfortunately rather broken for streaming use cases. I don't understand why 
there isn't a native python implementation based on 
https://github.com/confluentinc/confluent-kafka-python that doesn't have to 
deal with the portability layer.  It would be much more reliable, even if maybe 
less capable of parallel compute. 
   
   Our company has abandoned Beam and Dataflow for this very reason. Last bug I 
opened in August 2022, #22809 was closed today but still depends on 2 other 
issues, one of which remains unsolved #25114 half a year later. The Python SDK 
is clearly not a priority for the core team. Maybe they're too busy focusing on 
GCP-specific products like PubSub to put in the effort to make open source 
tools, like Kafka, work properly in Beam's Python SDK. There isn't even a 
single unit test in the test suite for an unbounded Kafka stream being windowed 
and keyed.
   
   As someone who really believes in Beam as a great portable standard for data 
engineering, it's sad to see the lack of interest from the core team in 
anything that is not making Google money (although we would still be paying for 
Dataflow if it worked).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to